Counts and specific values

Navigation:  Descriptive statistics >

Counts and specific values

Previous pageReturn to chapter overviewNext page

Let {xi} be a set of data values, and let {X1,X2,… Xn-1,Xn} be the set of these values arranged in ascending order. Then a series of very simple measures can be readily computed (some of which are available as SQL database language commands). Note that the data may be integer or real, and in some instances, purely nominal - for example, the data might represent a finite set of classes, such as classified land use types, in a remote sensed image. As noted earlier (Statistical Data) the initial analysis of datasets and computation of basic measures is a fundamental first step in the process of statistical analysis. In many instances data will have been loaded into tables, either within standard statistical software packages or in SQL-compatible databases. We have included details of the SQL commands and functions (where available as standard in most implementations) for the measures listed.

Count

The number of data values in a set

Count({xi})=n

In SQL this is implemented as the aggregate function COUNT()

Top m, Bottom m

The set of the largest (smallest) m values from an ordered set, {X1,X2,… Xn-1,Xn}.

Top m{xi}={Xn‑m+1,…Xn‑1,Xn}

Bottom m{xi}={X1,X2,… Xm-1,Xm}

May be generated via an SQL command TOP to yield the results in terms of the number of values or percentage of the total records - equates to numerical Top and Bottom if the data column selected is numeric and sorted. For the first and last records in a sorted list, the SQL FIRST() and LAST() functions can be used

Variety

The number of distinct, i.e. different, data values in a set. Some software packages refer to the variety as diversity, which should not be confused with information theoretic and other diversity measures. In SQL this is implemented as the command DISTINCT

Majority

The most common i.e. most frequent data values in a set. Similar to mode, but often applied to subsets of the data, for example all data lying within a particular range of values or local neighborhood of a sample point. For general datasets the term should only be applied to cases where a given class is 50%+ of the total

Minority

The least common i.e. least frequently occurring data values in a set. Often applied to subsets of the data, for example all data lying within a particular range of values or local neighborhood of a sample point

Maximum, Max

The maximum value of a set of values. May not be unique

Max{xi}=Xn

In SQL this is implemented as the aggregate function MAX()

R: max(x)

Minimum, Min

The minimum value of a set of values. May not be unique

Min{xi}=X1

In SQL this is implemented as the aggregate function MIN().

R: min(x)

Sum

The sum of a set of data values

In SQL this is implemented as the aggregate function SUM()

R: sum(x)

Average

The arithmetic mean of a set of numeric data values

In SQL this is implemented as the aggregate function AVG()