The term moments derives from physics, and in particular the concepts of momentum and inertia. They are applied to discrete or continuous density functions, f, and are defined as the expected value of the rth power of the random variable, i.e. for a continuous density function, f(x), we have:
The left hand side of this expression is referred to as the expected value of the function contained in brackets. For r=1 this is simply the definition of the mean, μ, of the density function f(x). For r>1 the moments are described as crude moments (effectively moments with respect to 0 rather than with respect to the mean). It is generally more useful to calculate the moments with respect to (or around) the mean (central moments), thus the second moment about the mean is the variance, and the third and fourth moments provide information about the shape of the distribution, as summarized below (integration is assumed over entire range):
The expected value operation is linear, so that E(aX+bX2+c)=aE(X)+bE(X2)+c, but E(f(X))≠f(E(X)) in general.
Variance and standard deviation
The variance (discussed in more detail in the measures of spread topic) is the second central moment of the density function, f(x), usually denoted by the symbol σ2. It is a measure of the spread of the distribution around the mean, and is zero if there is no spread (all the data is concentrated at the mean value). Because the variance is a squared measure it is, by definition, always positive. However, suppose the variance of a particular distribution with mean value 0 is 2, i.e. above or below 0. This indicates the average squared difference of values from 0 is 2 units. The standard deviation (or root mean squared deviation) is simply the positive square root of the variance.
Kurtosis indicates the degree to which the distribution is flattish (platykurtic) or more peaky (leptokurtic). The Normal distribution has a kurtosis of 3, but is often cited as having a kurtosis of 0, simply by defining the function with the 'excess' removed, i.e. with 3 subtracted from κ.
Because obtaining the moments of density functions is so important, it can be useful to find a function that will represent all of the moments. Such a function is called a moment generating function, or mgf, for obvious reasons. The standard function used is E(etx), i.e. the expected value of etx (if this exists). Using the expected value formula from the previous subsection, we have:
In these expressions f(x) is the density function of the random variable X, as before. Recalling that
the expected value, E(Xr), is then obtained from either the coefficient of tr/r! in the expansion of the mgf, or the rth derivative of the expectation function evaluated at t=0. An important feature of moments and moment generating functions is that (broadly) if the mgf's of two (discrete or continuous) random variables have the same moment generating functions then their two densities are equal. This feature can be used to identify the density function of some variable that is a function of another, known variable, and is the means by which the chi-square distribution is most readily derived as the sum of squared Normal variates.