Statistical indices

Navigation:  Descriptive statistics >

Statistical indices

Previous pageReturn to chapter overviewNext page

In numerous instances it is helpful to have a single composite measure to represent a complex set of data. Perhaps the most familiar example is the use of a single value to estimate the overall price of goods in the shops and the way in which such prices change over time. Indexes with names such as the "Cost of Living Index" or "Retail Price Index (RPI)" are well known examples. The computation of such index values can be a complex process (see the UK RPI and CPI Technical Manual for full details of how the UK government performs such calculations). The RPI provides the UK-specific retail price index data from 1947 to the present day, whilst the CPI is a slightly different measure that is increasingly used as it is the standard index throughout Europe. For a more extensive discussion of indices of this type, see the Wikipedia entry "Price index", in particular the section relating to Paasche and Laspeyres price indices.

Essentially an index of this type has a number of key elements:

a base year for which values are taken as providing the index value 100
a set of component items (e.g. items a typical consumer would purchase on a regular basis) together with their prices at a specific date
a set of weights, indicating the relative importance of each component item in the overall budget

This is most readily understood by looking at the UK RPI computation - the US CPI information (which is a form of regionally aggregated Laspeyres Index) is available from the US Bureau of Labor Statistics unit. The Index provides a single value, such as 117.2, for the weighted price of a 'fixed basket of goods' at time, t, as compared with the same basket at time t=0, the base date. In this example the index shows a 17.2% rise in the RPI over the period, which is an indicator of the level of retail price inflation. The formula used is:

where Pit is the price of component i of the basket of goods (and services) at time t, and Qib is the quantity of this component purchased in the base period, i.e. at time t=0. The expression can be re-arranged slightly by setting wi=Pi0Qib which gives the result:

This makes it clear that the index is a weighted average of the relative price of goods 'today' as compared to the base year. In 2007 the broad categories of UK RPI weights were: Food and catering: 152; Alcohol and tobacco: 95; Housing and household expenditure: 408; Personal expenditure: 83; and Travel and leisure: 262 (the total sums to 1000). These broad categories are made up of a detailed list of items, each of which has a specification of how it is constructed. These specifications (the detailed content of the basket) may change over time, and for convenience, the base date may also change. When this occurs the new base date is given an index value of 100 again and subsequent values relate to the new base date. Since 1947 the UK RPI has been re-based several times: in 1952, 1956, 1962, 1974 and 1987.

Another widely used index is the Body Mass Index, or BMI, used as a crude measure of body fat levels. This was invented in the mid 19th Century by Belgian scientist Adolphe Quetelet and is sometimes referred to as the Quetelet index. The index, which has only been popularized since 1972, is:

This is an example of an index that is widely used, and for which detailed graphs have been prepared, but which is not referenced to some 'ideal base' value and is not 'dimensionless'. One would expect weight, for a given density of material, to increase with the cube of height rather than the square. The use of the second power creates something of an anomaly, in that taller people will tend to have a higher BMI than expected. However, taller people tend to be thinner, so this somewhat offsets the height effect.

For a set of sample data a general method for producing a dimensionless index on a single variable, with a range [0,1], is to compute the expression:

Thus if we have the set of values: {-100,-30,0,10,50} the index computation would yield: {0,0.47,0.67,0.73,1}. This procedure is a form of standardization, transforming the data to a fixed positive range. This can be very useful when comparing or combining different datasets with very different absolute values.


Wikipedia: Price Index entry:

UK CPI and RPI information:

US Bureau of Labor: CPI Information: