The Poisson distribution is one of the most important and widely used statistical distributions. It is commonly used to describe the pattern of random pointlike events in 1, 2 and 3dimensions or, more typically, to provide the model for randomness against which an observed event pattern in time or space may be compared. If events occur randomly and independently, at a constant rate (in time) or with a constant density (in space), then the count of these events per unit time or per unit area will conform to a Poisson distribution and the pattern of occurrence is described as a Poisson process. The distribution of the length of intervals between events (or waiting times) in a onedimensional (1D) Poisson process is an Exponential distribution (see the link provided for a derivation of this relationship).
The Poisson probability mass function (pmf) is given by:
where the parameter, λ (the symbology typically used when rates are involved), or m, is the distribution mean. The cumulative distribution is simply the sum of the probability mass function:
In practical applications the Poisson should only be used where the number of events observed is reasonably large (typically >25, and preferably >100) and the probability of an individual event occurring at any particular time or place is small (typically <0.10). Events are assumed to occur entirely independently and do not occur simultaneously or at the same location. In many applications of the Poisson the mean, λ, is not large, but there is no requirement for λ to be small.
The charts below illustrate the Poisson distribution for mean values m=1,2,5 and 10. Note that the spread increases as the mean value increases (because the mean=variance). Variance stabilizing transforms, such as taking the square root of x, or the FreemanTukey transform, seek to avoid this effect.
The Poisson can be obtained as an approximation to the Binomial distribution where the number of events, n, is large and the probability of an individual event, p, is small, with the mean being constant: m=np. This approximation is achieved as n tends to infinity and p tends to 0, but m remains fixed. If n>100 and np<10 the Poisson provides a very good approximation to the Binomial. The key result is the observation that since the mean m=np, we can rearrange this to give p=m/n and since q=(1p) then terms in qn can be written as:
Now from calculus we have the result that links the Poisson and the Binomial:
We start with the Binomial distribution:
In temporal models the mean is defined as the intensity of events, λ, per unit time, t, i.e. m=λt and the probability that x events occur is a time interval [t,t+τ] is simply:
For large m the (standardized) Poisson can itself be approximated by the Normal distribution (despite the Normal being a continuous distribution), as is apparent from the chart for m=10 illustrated above. The widely used variancestabilizing FreemanTukey transformation:
provides a good approximation to the unit Normal, N(0,1), which does not require use of the estimated parameter, m.
If two variables, x and y, each have a Poisson distribution, then (x+y) has a Poisson distribution (the additive property). If their mean values are mx and my respectively then the mean of the sum is simply the sum of the mean values. Likewise, if (x+y) has a Poisson distribution and x and y are independent random variables, then x and y both have a Poisson distribution. By extension, these results apply to sums of any number of Poisson variates. However, the difference between two Poisson variates is not a Poisson, but a Skellam distribution.
The standard Poisson distribution is defined for integer values of x, including x=0. However, there are many situations in which the class x=0 is not included in the sample data. For example, where measuring devices only become active when one or more events occur or exceed some threshold value, or where all zero values have to be excluded from a dataset, for example where zero values represent no measurement or are of no interest and would otherwise bias analysis of the data. The zero truncated Poisson distribution, or Positive Poisson distribution, has a probability density function given by:
which can be seen to be the same as the nontruncated Poisson with an adjustment factor of 1/(1em) to ensure that the missing class x=0 is allowed for such that the sum of all terms adds up to 1. If m is large the adjustment factor has little or no effect, but if m is small (<4) the adjustment is much greater. Key measures for the Poisson and the Positive Poisson are shown below.
Key measures for the Poisson distribution:
Mean 
m 


Variance 
m 
because variance=mean, the spread increases as the mean value increases, although the variance mean ratio (VMR) remain constant 
Skewness 
1/√m 

Kurtosis 
3+1/m 
or 1/m if 3 is deducted first 
MGF 

Key measures for the Positive Poisson:
Mean 
m/A 
where A=1/(1em) 

Variance 
Am(1Amem) 
where A=1/(1em). Note the formula in Johnson & Kotz (1969, [JOH1]) is incorrect 
Many statistical distributions are related to the Poisson, or involve extensions to the Poisson process. For example, the basic 2dimensional Poisson Cluster Process (PCP) is similar to a simple 2D Poisson process in that it starts with a random point set. These points are regarded as “parents”. Each parent then produces a random number of offspring, and these are located at random around the parent according to a bivariate distribution function, g(x,y), for example a circular Normal with variance (spread), σ2. The parents are then removed from the set and the PCP consists of the set of all offspring. The PCP is thus more clustered than a pure Poisson process and is a member of a broad class of clustered or contagious distributions.
Applications and examples
1Dimension:
(a) A production line produces standardized computer processor boards every day. Each board is tested to check it works, and the number of boards that do not work is recorded. A table is produced showing the number of boards that fail each day, from 0,1,2,3 etc. Over a long period of time the average number of defective boards is found to be 1.25. The observed distribution of defective boards can be compared with a Poisson distribution with mean m=1.25 to see if frequency pattern matches that of the Poisson which would suggest the failures were occurring at random. Observed frequencies should be divided by the total number of days for which observations were taken to produce a probability or relative frequency distribution prior to comparison using standard goodness of fit tests, such as chisquare or KolmogorovSmirnov methods.
(b) In many problems relating to queuing it is assumed, at least initially, that events arrive at random over time. For example, we might assume that telephone calls to call centers are independent random events arriving at a constant rate, λ, over a defined period of time (see further, Erlang distribution). In order to compute the probability that lines to the call center will be busy we need to know how long calls are on average and what the call arrival rate, λ, is (calls per minute or per hour). Finally we need to know how many lines are available, but this may be the result of the computation rather than an input variables, i.e. we can increment the number of lines (or queue positions in a supermarket checkout) until will reduce the congestion probability to an acceptable level. The arrival rate multiplied by the average call duration gives us the level of traffic, measured as call hours per hour or call minutes per minute, that we need to service. Assuming the arrival pattern is a Poisson process and the call traffic, A, is known, then the probability that N lines are busy (usually referred to as the Grade of Service, or GoS) can be shown to be:
Hence if there is a single serving line and one unit of traffic (e.g. 1 hour of call traffic per hour) the probability that a single line will be congested (or a single server queue will be occupied) is 50%, whereas with two lines it falls to 20% and to less than 2% with 4 lines. This formula is due to the Danish engineer and mathematician, A K Erlang, and has been shown to be independent of the exact distribution of individual call durations. This formula was, for many years, the key result used in the design and planning of analog telecommunications networks and facilities. However, it is important to note the number of assumptions made. Consider, for example, the issues that arise if the expected probability of congestion reaches a level for which callers fail to get through and start redialling (thus individual call events are not independent) or where telephone competitions linked to live TV promotions take place (creating sudden peaks in call traffic so the arrival rate is not constant). Dealing with such questions is the subject addressed by queuing theory and teletraffic engineering.
2Dimensions: The locations of a substantial number of trees in a study area are recorded and plotted on a rectangular map of area A. A total of N trees are recorded. The average density of trees over the study area is thus m=N/A. The map is then divided into G grid squares. The number of trees in each square is counted and the frequency distribution recorded. The observed frequency distribution of counts (number of grid squares containing 0,1, 2, 3... trees) is then compared to that expected for a Poisson with mean=m/G, using a suitable goodness of fit test.
Notes on usage and parameter estimation
(i) In the examples cited above, the (arbitrary) division of the timeline into days, and the study area into G grid cells may affect the results obtained in a number of ways. If the division of time or space is into very small sections, the only recordable values will be 0 and 1; for example, the presence or absence of a single tree. If the division is into very large sections, e.g. a single large unit, the only recordable value will be the sum of the observed events, N. In both instances the range of observed values for x will not be usable for comparison with the Poisson. A second issue is the placement of the sections  the start time for temporal data or the exact placement and orientation of the grid for spatial data. If the underlying pattern is truly random then placement will have no effect on the results, although this assumes that edge effects are negligible.
(ii) A further issue relates to the loss of information when the division of time or space into regular sections takes place. The set of lengths of intervals between events (in time or space) is a richer source of information regarding event distribution than any grouping of events into counts. This has led to a number of techniques that examine the distribution of observed lengths (time intervals or distances) and then compare these to those expected with a Poisson process.
(iii) The proportion, p0, of empty sections (time slots or empty grid cells, i.e. class x= 0) in a Poisson distribution is simply em, hence m can be estimated from the expression: m=lnp0, but this estimate is biased. However, since one will not generally know in advance whether a distribution is Poisson or not it is recommended that the proportion of events for which x=0 is used as the best estimator for the zero class.
(iv) Since the mean equals the variance, the variance/mean ratio (VMR) has an expected value of 1. An observed unimodal discrete distribution with VMR≅1 is an initial indicator that the distribution might be Poissonlike, and the underlying generation of events might be a Poisson process. If the VMR>>1 a clustered model might be considered as an alternative.
References
[JOH1] Johnson N L, Kotz S (1969) Discrete distributions. Houghton Mifflin/J Wiley & Sons, New York
Mathworld/Weisstein E W: Poisson distribution: http://mathworld.wolfram.com/PoissonDistribution.html
Wikipedia: Poisson distribution: http://en.wikipedia.org/wiki/Poisson_distribution
see also: Poisson regression