Probability distributions

Navigation:  »No topics above this level«

Probability distributions

Previous pageReturn to chapter overviewNext page

In this topic we provide details of a wide range of discrete probability distribution functions and continuous probability distribution functions. In each case, where appropriate, we show: the distribution itself; the cumulative distribution (the sum or integral of the distribution function); the moments (where available); graphs of the distribution based on a selection of parameter values, and provide comments on their origins, features and applications. There are a number of excellent resources for this topic. These include: the series of books by Johnson and Kotz (1969,1970,1972 - subsequently updated and republished in extended versions like so many other major statistical works [JOH1]-[JOH4]); the Handbook of formulas edited by Abramowitz and Stegun (1972 [ABR1]); and the comprehensive online resources covering statistical distributions at Mathworld and Wikipedia. An extensive set of distributions can be viewed using the excellent interactive Java-based education library provided at UCLA's Statistics Online Resource Center (SOCR). Tables for many of the main probability distributions are available in printed form from various sources, including the RSS (downloadable watermarked PDF) and within this Handbook in the Distribution tables section.

Univariate (and multivariate) probability distributions are essentially statistical models of datasets, i.e. they provide analytical models of the varying frequency distributions of random variables. In the univariate case they can be denoted by the functional representation, f(x), where x is a discrete-valued or continuous valued real variable, f(x) is positive for all x, and the sum or integral of f(x) over the domain of x is unity. When referring to discrete probability distributions authors often use the notation P(X=x) or Pr(X=x), but we have largely kept to the form f(x) for all types of probability distribution in order to avoid using multiple notations. Typically these models have relatively few parameters and standard procedures have been developed that enable these parameters to be estimated from sample data (methods such as the use of sample moments, maximum likelihood procedures and Bayesian methods).

If a given probability distribution provides a good fit to the sample data, we might reasonably conclude that the model can be used to describe the distribution of other samples from the same population, or (with different parameters) from other, very similar populations. Furthermore, if the assumptions underlying the formation of the model distribution can reasonably be regarded as applying to the sample data, and the sample can be considered as having been randomly selected from the population (i.e. is representative and not biased) then the model distribution may be used to draw inferences from the sample about the population as a whole. For example, we can compute the proportion of a model distribution (with parameters derived from one or more samples) that is to be found for values of x>A, say, and infer that this represents the proportion of x>A in the population from with the samples were drawn. Likewise, if we assume a particular model distribution with pre-specified parameters applies to a particular population, then we can ask whether a particular sample is likely to have been drawn from that population or not. Conversely, if sample data does not satisfy key aspects of the model distribution, for example it is a poor fit to the model and/or the assumptions on which the model is based are not met and/or it is not a random sample, then it is not acceptable to draw such inferences - either an alternative model is required, which might be of a different logical type (e.g. a simulated distribution), or an alternative and appropriate analytical model distribution might be sought. A final option might be to use the model distribution just as a descriptive model, and not to draw inferences from it regarding the population.

An important criticism of many probability models is that they are, in the main, thin tailed, that is they often assign very low probabilities to a large range of possible events that lie far from the mean. In many instances such extreme values are of great interest, particularly to those involved in risk analysis (e.g. insurance, financial trading, disaster management). Whilst this is an area under-represented in many statistics courses, it is nonetheless of considerable interest, and there are numerous books and even an academic journal, Extremes, devoted to the study of such problems. There are specialized distributions, such as the Weibull and the Gumbel, that are used to provide estimates for the maximum and minimum values of data that is drawn from a known distribution. Estimation of the parameters and providing a sound basis for using such distributions is often challenging, but is also of great importance because the impact of rare events is often far greater than the impact of more common occurrences.

There are a number of alternatives to analytical models, which are of particular use in many real-world applications. The two most widely used involve the use of Monte Carlo simulations to create synthetic probability distributions upon which inferential analysis may be based; and kernel density methods, of particular use for some one- and two-dimensional probability density estimation and smoothing problems. With the rise in computer processing power, storage and exploratory data analysis (EDA) techniques, such computationally intensive procedures have become an increasingly important part of the arsenal of tools available to the statistician.

The terminology relating to probability distributions is somewhat confusing. We have discussed this briefly in our earlier discussion of probability theory, but for clarity we re-state these definitions here, drawing on the approach adopted by Johnson and Kotz (1969, p15).

Random variable: A (real) random variable X is a quantity taking real values such that the probability that Xx i.e. P(Xx), exists for all real x.

Cumulative distribution function (cdf): The quantity F(x)=P(Xx) is called the cumulative distribution function, or cdf, of X. It is a monotonically increasing function with 0≤F(x)≤1. From this we also see that the probability that X lies in the range [a,b] where a≤b is F(b)-F(a). The study of distribution functions is essentially the study of cumulative distribution functions. Distribution functions may be discrete, defined by a finite set of probabilities and have a cdf that is comprised of a series of steps over the range [0,1]; or they may be defined by an infinite set of probabilities and are continuous, having a cdf that is smooth and can be expressed as a definite integral (see further, below). With an infinite set of probabilities, the probability of an individual value, x, is vanishingly small, and it is more meaningful to talk about the probability that an event lies in some interval, however small, expressing this as an integral over the range in question.

Probability mass function (pmf): Let X be a real random variable which takes a finite set of n values {xi}, and let f be a function, f(xi) ≥ 0 for i =1,2,3...n, that represents the probability of observing the specific values {xi}. Then f is called the probability function (or the discrete probability density function, or the probability mass function, of the random variable, X.

Cumulative probability mass function: The cumulative distribution function, F, for a discrete distribution is simply the sum of the set of probabilities for each x-value, where the x-values are ordered by size:

Note that the sum over all x (i.e. the sum over the sample space, S) gives the result F(x)=1.

Probability density function (pdf): Let X be a continuous random variable, i.e. X may take on an infinite set of values over a finite or infinite range. For continuous random variables the discrete probability mass function is replaced with its continuous equivalent, the probability density function, f(x).

The probability that X lies in the range ab is thus:

Cumulative density function (cdf): The cumulative density function (of a continuous distribution) is simply:

where t is a dummy variable introduced for convenience, and

If the cumulative distribution function is known and is a differentiable analytic expression then the pdf can be expressed in analytical form using the differential: f(x)= dF(x)/dx

Multivariate distribution functions: The definitions above can be readily extended to two or more variables to form discrete and continuous multivariate distributions. The summations and integrations are simply extended over each of the n-dimensional random variables, giving, for example, for the discrete case:

and for the continuous case:

The simplest multivariate distribution is the bivariate, i.e. with two variables, x and y say. The cumulative distribution in the bivariate case is:

If these variables are independent, their joint distribution is simply the product of the two separate (marginal) distributions: f(x,y)=f(x)f(y). More generally, however, the two variables are not independent and we have expressions of the form: f(x,y)=f(x)f(y|x)=f(x,y)=f(x|y)f(y), i.e. the joint distribution is the product of the conditional distribution of y given x times the marginal distribution of x, or vice versa. Again, these relationships can be directly extended to more than two variables.

In the case of a discrete distributions, we have:

The example of fathers and sons stature given earlier in this Handbook illustrates the notion of a bivariate joint distribution and the row and column totals (the summed relative frequencies, resulting in a process of averaging over the x or y) provide the marginal distributions. The dark grey column highlighted in this example gives the conditional distribution of y given x=7.

Sampling distribution: If a sample of size n is taken from any population and a statistic computed, such as the sample mean or sample variance, this value will vary from sample to sample (unless, of course, the population values are all identical). The computed statistic will, therefore, have its own distribution, which is known as its sampling distribution. In a small number of cases, such as samples from a Normal distribution, the sampling distribution of selected statistics is known, but in general such distributions can only be estimated using Monte Carlo simulations, with one of the most widely applied approaches being re-sampling, typically using a bootstrap procedure. In the case of a Normal distribution with mean μ and variance σ2, the distribution of the means from a set of samples of size n is ~N(μ,σ/n1/2). An equally important sampling distribution arises when the sum of squared samples from a Normal distribution are taken - the resulting sampling distribution is a chi-squared distribution. The F distribution and Student's t distribution are further examples of sampling distributions.


[ABR1] Abramowitz M, Stegun I A, eds.(1972) Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables. 10th printing, US National Bureau of Standards, Applied Mathematics Series - 55

[JOH1] Johnson N L, Kotz S (1969) Discrete distributions. Houghton Mifflin/J Wiley & Sons, New York

[JOH2] Johnson N L, Kotz S (1970) Continuous Univariate Distributions, I. Houghton Mifflin/J Wiley & Sons, New York

[JOH3] Johnson N L, Kotz S (1970) Continuous Univariate Distributions, II. Houghton Mifflin/J Wiley & Sons, New York

[JOH4] Johnson N L, Kotz S (1972) Continuous Multivariate Distributions. Houghton Mifflin/John Wiley & Sons, New York

[PEA1] Pearson E S, Hartley H O eds. (1954) Biometrika Tables for Statisticians. 4th edition. Vol. 1, Cambridge University Press, Cambridge, UK