The chi-square distribution (also called the chi-squared distribution) has a particularly important role in statistics for the following reason:

if X1, X2, ....Xν is a set of ν independently and identically distributed (iid) Normal variates with mean μ and variance σ2 , and let

zi=(Xi-μ)/σ

hence the zi are ~N(0,1), then the sum of the squares of these transformed variates is distributed as a chi-square distribution with ν degrees of freedom:

The distribution itself is a form of Gamma distribution, with parameters α=ν/2, β=2 and γ=0 (see Johnson and Kotz, 1969, ch.17 [JOH1]). This yields the standard form of the chi-square distribution:

which is described as a chi-square distribution with ν degrees of freedom. The distribution tends to the Normal for very large ν. The derivation of the distribution is simplest when approached from the method of moment generating functions. This enables the MGF to be obtained as a simple expression of the form:

which is the same as the MGF for a Gamma distribution with parameters α=ν/2, β=2, hence the distribution of the sum of squared Normal variates is a particular form of the Gamma distribution that is called the chi-square.

A second very important feature of the chi-square distribution is that the ratio of two independent chi-square distributions is an F-distribution. More specifically:

Since the sums of squared Normal variates is the form taken by variance calculations, the ratio of two sets of variances (for example the variance within groups and the variance between groups in an analysis of variance computation) adjusted by the number of items contributing to these summations, will be distributed as an F-distribution (assuming the samples are drawn from a Normal distribution, as noted above). This enables the percentage points of the F-distribution to be used as a means of testing the significance of ratios of this type, i.e. a means of identifying unusually large differences in variances.

The distribution is often used in tests of goodness of fit and in contingency table analysis. In both cases a set of expected values, ei, are compared with a set of observed values, oi, and a measure of the form:

is computed. This statistic is then compared with the percentage points of the chi-square distribution for the relevant degrees of freedom, for example in a comparison with data from a table with r rows and c columns, the degrees of freedom will be df=(r-1)(c-1) because the row and column totals are known and used to compute the expected values under the assumption of independence. Sample plots of the distribution for selected values of the parameter (df or ν) are shown below:

Chi-square distribution

Key measures for the distribution are provided in the table below. Distribution tables are provided in the Resources topic, Distribution tables section:

References

[JOH1] Johnson N L, Kotz S (1970) Continuous Univariate Distributions, I. Houghton Mifflin/J Wiley & Sons, New York

Mathworld: Weisstein E W: Chi-square distribution: http://mathworld.wolfram.com/Chi-SquaredDistribution.html

Wikipedia: Chi-square distribution: http://en.wikipedia.org/wiki/Chi-square_distribution

Wikipedia: Chi-square test: http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test