Normal distribution

<< Click to Display Table of Contents >>

Navigation:  Probability distributions > Continuous univariate distributions >

Normal distribution

The Normal, or Gaussian, distribution is rightly regarded as the most important in the discipline of statistics. It is normal in the sense that it often provides an excellent model for the observed frequency distribution for many naturally occurring events, such as the distribution of heights or weights of individuals of the same species, gender and genetic grouping. An example is shown in the histogram below, which shows the son's height data from Pearson's dataset (1078 cases) discussed in our introductory section on Probability Theory (x-values are in inches, and show the mid value for that group). A Normal distribution with mean and variance matching the sample data is shown as an overlay on the chart.

Son's height data, from Pearson and Lee (1903 [PEA1])


The form of the Normal distribution is broadly the shape of a bell, i.e. a symmetric smooth form with a single mode that is also the location of the mean and median. Either side of the mode there is a point of inflection of the bell curve which is one unit (one standard deviation) from the mean (illustrated by the horizontal line in the graph above). Beyond this point the curve extends towards the x-axis asymptotically, with a theoretical extent to infinity in both directions.

This overall pattern of behavior is illustrated in the chart below. The green line, which equates to a Normal distribution with mean, μ=0, and standard deviation, sd or σ=2, has its points of inflection at -2 and +2. By +/-6 (or 3 standard deviations) it appears to be very close to the x-axis. A similar pattern can be seen for each of the other curves — by the time 3 standard deviations have been reached almost all of the area under the Normal curve has been accounted for (i.e. almost 100%). It is clear that a shift (or re-location) of the mean to the right would simply move the entire curve to the right; if this shift is at least 3 standard deviations it is clear that the Normal can provide a model for data that follow this so-called normal, bell-shaped frequency pattern, but for which only positive values having meaning (as in the example cited above).

Normal distribution curves


The most general formulation of the Normal Distribution is due to R A Fisher, but the use of the name pre-dates him to the time of Francis Galton in the mid 1870s. Fisher introduced the mean value, μ, into the formula for the distribution as a way of re-locating the center of the bell curve from it standard location of 0:

This looks a rather daunting expression, but can be greatly simplified — the unit Normal (U, standard Normal, or N(0,1)) distribution has mean value μ=0 and standard deviation σ=1. Taking these values in the expression above, or using the z-transform:

produces the unit Normal:

The cumulative distribution function is simply:

For many years tables of the unit Normal provided the facility for determining probability levels, and distribution tables are provided in the Resources topic of this Handbook, Distribution tables section. However the values now are computed by simple function calls with explicit inclusion of the estimated mean and standard deviation values from samples, or are automatically reported by software packages based on these parameter values. Commonly used 'critical' values for the unit Normal are shown below:

Table of Normal distribution cumulative probabilities (one tail)



































This table provides the cumulative distribution (probabilities) in the left hand column, and the deviations from a mean value of 0 in the right hand column (where the standard deviation is 1). Hence 99.9% of all values are expected to be less than +3 standard deviations from the mean. 97.5% are expected to be less than +1.96 standard deviations from the mean (see right hand plot of the Normal distribution, below). This is not the same as saying that 97.5% of the distribution lies within +/-1.96 standard deviations from the mean, because it includes all values from - up to 1.96, not just the values in the range +/-1.96 standard deviations. Because the distribution is symmetric, and 2.5% of values are expected to be greater than 1.96 units, and the same applies for -1.96 units, i.e. 2.5% will be less than this value, so the area in the range [-1.96,1.96] is 95% — this is illustrated in the two distribution plots below (see R codes samples for details of how these graphs were generated). The 97.5% level is described as a one-tailed value (as in the right hand plot) because it includes all values up to the value +1.96, whereas the 95% value is two-tailed (as in the left hand plot), because it includes both the upper and lower tails of the distribution. The two-tailed interpretation is the most widely used, i.e. within or outside a range that is either side of the mean rather than simply less than or greater than but not both.

Normal distribution plots, two-tailed and one-tailed, 2.5% included in the tail


In addition to the Normal distribution being a useful model for many examples of naturally occurring datasets, it can be obtained directly from a number of analytical approaches. For example, it can be shown that the distribution of a measurement, x, that is subject to a large number of independent, random, additive errors, will tend to a Normal distribution (a result due to Gauss in 1816, drawing on his work on astronomical data). As discussed earlier, the Normal distribution may also be derived as an approximation to the Binomial distribution when p is not small (e.g. p≈1/2) and n is large; more specifically it provides a good estimate when npq>25 typically when n is large and p is not small (nb: this implies n>100). However, perhaps the most important result, originally obtained by Lyapunov in 1900, is that the distribution of n mean values of independent random samples drawn from any underlying distribution with finite mean and variance is also Normal. More specifically, let x1,x2,x3... xn be random variables drawn from some (unknown!) frequency distribution, f(x), with mean μ and variance σ2, then the standardized variable:

is distributed as N(0,1) for large n. Convergence of this expression to the unit Normal is O(n1/2) with convergence fastest for distributions that are broadly symmetric and that have variances>1. The expression in the denominator is known as the standard error (SE) of the mean. This remarkable result, which we have discussed earlier in this Handbook, is known as the Central Limit Theorem — and because it makes a statement about the mean values from any underlying distribution, it provides an explanation for the fact that some datasets are approximately Normal, simply because they represent averaged values. For example, many measurements of natural phenomena are based on aggregated materials, or aggregated results, so a degree of averaging occurs as part of the data collection and recording process. However, it is equally true that frequently such data in not Normally distributed, but skewed with many small values, all of which are greater than 0, and few large values. For these datasets it is often possible to apply a simple log transform to produce a more Normally distributed sample. This may be desirable in order to apply a statistical technique that directly uses the Normal distribution as its underlying model (for example, as is the case in most regression models), or because the Normal distribution is an underlying assumption, but another distribution, derived from the Normal, is used in the testing procedure. For example, the sum of squared Normal variables gives a Chi-square distribution, and the ratio of Chi-square variables (standardized by their degrees of freedom) gives an F distribution, and this is widely used to test whether the variation in datasets is similar or markedly different. Linear sums of Normal variates are Normally distributed.

Key measures for the Normal distribution are provided in the table below:








0 (or 3, if the excess element is omitted)


As noted above, a table of the cumulative Normal is provided in the Resources topic, Distribution tables section

There are many variants of the Normal distribution that may be useful in different situations. As with several other distributions, such as the Gamma, truncated versions (e.g the Half-Normal) and compound versions have been widely studied. The von Mises distribution is a special case in which a 'Normal distribution' is applied to circular data. The Lognormal, being the distribution of z=log(x-θ) where x is Normally distributed, has also been widely studied. Of considerable importance are multivariate versions of the Normal. The simplest multivariate version is the Bivariate Normal, which is extensively used in spatial analysis. The Bivariate Normal looks very like the standard Normal, but includes two mean values and two standard deviation values as parameters, and a correlation coefficient which measures the relationship between the two variables. Very often the distribution is defined with the two mean values as equal, in which case the form of the distribution can be plotted as a series of elliptical regions around the common mean, where the major axis of the ellipse reflects the larger of the two standard deviations and the direction of this axis is determined by the correlation between the two variables. If both variances are also equal the Bivariate Normal can be envisaged as a 3D bell formed by rotating a Normal distribution about its mean. This arrangement is commonly used in Kernel density estimation.


[JOH1] Johnson N L, Kotz S (1970) Continuous Univariate Distributions, I. Houghton Mifflin/J Wiley & Sons, New York

[PEA1] Pearson K, Lee A (1903) On the Laws of Inheritance in Man: I. Inheritance of Physical Characters. Biometrika, 2(3), 357-462

Mathworld: Weisstein E W: Normal distribution:

Wikipedia: Normal distribution: