<< Click to Display Table of Contents >> Navigation: Descriptive statistics > Counts and specific values |
Let {xi} be a set of data values, and let {X1,X2,… Xn-1,Xn} be the set of these values arranged in ascending order. Then a series of very simple measures can be readily computed (some of which are available as SQL database language commands). Note that the data may be integer or real, and in some instances, purely nominal — for example, the data might represent a finite set of classes, such as classified land use types, in a remote sensed image. As noted earlier (Statistical Data) the initial analysis of datasets and computation of basic measures is a fundamental first step in the process of statistical analysis. In many instances data will have been loaded into tables, either within standard statistical software packages or in SQL-compatible databases. We have included details of the SQL commands and functions (where available as standard in most implementations) for the measures listed.
Count
The number of data values in a set
Count({xi})=n
In SQL this is implemented as the aggregate function COUNT()
Top m, Bottom m
The set of the largest (smallest) m values from an ordered set, {X1,X2,… Xn-1,Xn}.
Top m{xi}={Xn‑m+1,…Xn‑1,Xn}
Bottom m{xi}={X1,X2,… Xm-1,Xm}
May be generated via an SQL command TOP to yield the results in terms of the number of values or percentage of the total records — equates to numerical Top and Bottom if the data column selected is numeric and sorted. For the first and last records in a sorted list, the SQL FIRST() and LAST() functions can be used
Variety
The number of distinct, i.e. different, data values in a set. Some software packages refer to the variety as diversity, which should not be confused with information theoretic and other diversity measures. In SQL this is implemented as the command DISTINCT
Majority
The most common i.e. most frequent data values in a set. Similar to mode, but often applied to subsets of the data, for example all data lying within a particular range of values or local neighborhood of a sample point. For general datasets the term should only be applied to cases where a given class is 50%+ of the total
Minority
The least common i.e. least frequently occurring data values in a set. Often applied to subsets of the data, for example all data lying within a particular range of values or local neighborhood of a sample point
Maximum, Max
The maximum value of a set of values. May not be unique
Max{xi}=Xn
In SQL this is implemented as the aggregate function MAX()
R: max(x)
Minimum, Min
The minimum value of a set of values. May not be unique
Min{xi}=X1
In SQL this is implemented as the aggregate function MIN().
R: min(x)
Sum
The sum of a set of data values
In SQL this is implemented as the aggregate function SUM()
R: sum(x)
Average
The arithmetic mean of a set of numeric data values
In SQL this is implemented as the aggregate function AVG()
R: mean(x)