The preceding sections have shown how statistics developed over the last 150 years as a distinct discipline in direct response to practical real-world problems. In this topic we introduce the concept of probability in a rather more formal manner, initially describing the classical concept of probability, and then moving on to a discussion of frequentist and Bayesian statistics.

At first probabilities were seen as matters of chance that could be addressed by listing all the possible outcomes of a well-defined problem, and then examining the number of ways each outcome could occur. For example, when throwing a single 6-sided die, there are 6 mutually exclusive and equally likely outcomes, one of each attribute 1,2,3,4,5,and 6. So the probability of a particular outcome, A, in this example is P(A)=1/6. However, if we throw two dice, the possible outcomes are a bit more complicated. For example, we could throw a 1 first and then any of the numbers 1 to 6, or a 2 first and any of the numbers 1 to 6, and so on. Thus there are 6x6=36 combinations possible, so each outcome will occur with probability 1/36, but some combinations are essentially the same - there will be two instances of a 1 and a 2, two of a 1 and 3, etc, but only one instance of 1 and 1 and one of 6 and 6. With more dice or more throws of a single die, the pattern becomes more complex, and as we have already seen, this can be modeled using the Binomial distribution. We can put these observations more formally:

"If an event can occur in n mutually exclusive and equally likely ways, and if nA of these have a certain attribute, A, then the probability of A, P(A), is the fraction nA/n"

So, using our example above, there are n=36 possible outcomes when we throw two dice, and if the attribute or event, A, we are interested in is that of throwing a 1 and a 2, this can occur in just two ways (1,2 or 2,1), so the probability is the fraction 2/36 or 1/18. This is the classical, or a priori, definition of probability, which is typically built upon a set of realizations of physical processes (rolling dice, spinning a roulette wheel, etc.). It is also an example of a relative frequency view of probability, which is broadly ‘objective’. Furthermore, this a long-run, or ‘limit’ view, of a random process, in that we would not expect the proportion of attribute A events to be exactly 1/18 if we throw two dice 18 times - we might not see the pairs (1,2) or (2,1) at all in this set of throws. However, we would expect the proportion to converge towards the figure of 1/18 over a large number of random trials, i.e. the number of events (1,2) or (2,1) divided by the number of trials will tend to 1/18 as the number of trials tends to infinity.

Whilst simple and intuitive, there are many problems with this a priori definition of probability. What if there are an unknown or unlimited number of possible outcomes? What if the probability of different outcomes is not equally likely? What if want to ask a question that is less well-defined, such as "what is the chance a male of 50 will die within the following year?" or "what is the chance that the light bulb I have just purchased will last for 1000 hours or more?". These various different types of question are more readily addressed by adopting a different approach to probability, known as the a posteriori or frequentist approach (although the term frequentist is used by authors in a variety of ways). In the frequentist approach we again adopt a relative frequency perspective but we run a series of well-defined random trials and use the results as an estimate of the 'true' or population probabilities. In our Introduction (Historical context section) we showed that running a very large number of (computer generated) trials of throwing a single die at random, we obtain an increasingly precise estimate of the true (a priori) probability. This approach, examining probabilities through experiments or trials, simulations, surveys, and/or making careful observations of events in the real world (sampling) became the predominant approach to understanding and estimating the probabilities of events occurring during the first half of the 20th century. Again, this a relatively formal, objective model of probability, upon which much of the structure of modern statistical methods is built, and remains the predominant model used in software packages and statistics teaching today. Indeed, this approach is sometimes referred to as the statistical method.

With two possible outcomes for any event, either both occur with probability 1/2 or one occurs with probability greater than 1/2 and one with probability less than 1/2. With four outcomes the pattern is for all to occur with probability 1/4 (2-2) or perhaps one or two occur with probability greater than 1/4 and the remainder less than 1/4, since the total probability must not exceed 100% or 1. With M possible outcomes this implies that almost all event probabilities will be 2-M or smaller. This observation, that most events occur with very small probabilities, has some important implications when moving from discrete to continuous distributions, and when considering how to represent statistical datasets efficiently.

The subsections that follow expand upon probability theory in more detail, commencing with the notions of odds and risks, and then reviewing the notion of probability in more formal terms.

[MOO1] Mood A M, Graybill F A (1963) An Introduction to the Theory of Statistics. 2nd ed., McGraw Hill

[NIST] NIST/SEMATECH e-Handbook of Statistical Methods: http://www.itl.nist.gov/div898/handbook/

[PEA1] Pearson K, Lee A (1903) On the Laws of Inheritance in Man: I. Inheritance of Physical Characters. Biometrika, 2(3), 357-462