Confidence intervals

<< Click to Display Table of Contents >>

Navigation:  Statistical concepts >

Confidence intervals

When a sample has been taken, and some parameter or measure such as the mean value has been calculated, this provides an estimate of the population parameter. However, it remains an estimate, and different samples might yield a range of different values for the parameter. It is helpful to have some idea as to the size of this range since the expectation is that the true or population value will lie within this range, but without taking a very large number of samples the possible range remains unknown. However, if the distribution of the observations is known it is possible to provide an estimate of upper and lower limits which will include the population parameter with a given level of probability. Such bounds are known as confidence limits, and the range or interval as the confidence interval for the parameter in question. In many instances such limits are sought for an estimate of the mean value or for a proportion. In general confidence intervals for a mean value are obtained based on an assumption that the observations are Normally distributed even if the data do not fully conform to this assumption. For other parameters, such as the variance, confidence limits are affected more by the underlying distribution.

Mean values

If a sample of size n is drawn from a population with known mean, μ, and standard deviation, σ, one can construct confidence intervals for the sample mean based on this information. If the population distribution is Normal, the sample mean will lie in the range [μ-kσ/n, μ+kσ/n] with a probability that can be computed from the Normal distribution depending on the size of k. But in general the population mean is not known and the standard deviation is also not known in advance. If there is a very good estimate of the population standard deviation (e.g. based on a large sample or prior data), then we can provide a confidence interval for the true mean using the sample mean, with the interval being:

For example, if k=1.96 the sample mean will lie in the range indicated with a probability of 95%, a value that is obtained from the area under the Normal distribution leaving a lower tail of 2.5% and upper tail of 2.5%. If k=3 the limits, which would be considerably wider, would provide 99.7% confidence of the possible range of values the true population mean. The multiplier, k, here is often written as zα/2, where z refers to the unit Normal distribution variate at the level α/2. In the example just cited, α/2=0.025 and (1-α)=0.95 or 95%.

If the sample size, n, is not large (for example, n<20) the estimated value of the population standard deviation is itself subject to uncertainty, so the confidence intervals need to be larger still. In this case a small-sample approximation to the Normal is used, known as the t-distribution. The confidence limits are computed in exactly the same manner as above, but with values of k from the t-distribution being used in place of those from the Normal. If we take α/2=0.025 as before, the multiplier is 4.3 for a sample size of 1, falling to 2.23 for a sample size of 10, 2.09 for a sample size of 20, and to 1.96 for very large samples (at the limit as n→∞).


In the case of simple proportions, a similar calculation is carried out. If x is the number of events observed in a sample of size n, the proportion of these events is thus p=x/n and we define q=(1-p). Then an estimate of the true range of values that the population proportion, P, is expected to have (for a reasonably large sample, with p not too close to 0 or 1) is:

This interval estimate (known as a Wald interval) can be significantly improved upon for some 'rogue' values of n and p, even when these are large (see further, the section on tests for a proportion). If the population is of finite size, N, the second term in these expressions is adjusted by a factor (N-n)/(N-1), although if n is small in comparison to N (<5%) the adjustment has little effect.

To illustrate the use of such confidence intervals we consider the case of the number of deaths following heart surgery on children up to 1 year old. In a hospital in Bristol, UK, for one type of such operations, a surgeon was reported to have a death rate of 60% amongst such patients, compared with a national average of 13%. Using the formulas above we can estimate the expected range of death rates for both the surgeon and the UK as a whole, and if the upper estimate for the UK is above the lower estimate for the surgeon, we could argue that this observation might reasonably have occurred by chance. In this particular example, which was the subject of a public inquiry, the evidence was quite complicated. The 60% death rate might have been the result of 3 out of only 5 operations, the children in question might have been unusual in the severity of their condition, the result may reflect on the overall performance of the hospital or hospitals of this type and size. In fact the 60% figure represented 9 deaths from 15 operations, compared with 16 from 123 in the UK as a whole. And there was no evidence that the children in question were unusually ill, although there was some effect related to the size of the hospital involved (hospitals that conduct a larger number of such operations tend to have better overall performance rates and the UK Government in 2010 decided to concentrate all such operations in large specialist centers). Taking k=1.96 for a 95% confidence interval, we have an approximate lower bound (LB) for the surgeon and the upper bound (UB) for the UK as:

This analysis suggests that the surgeon in question was indeed performing very poorly when compared with the national benchmark. It should also be noted that the idea of confidence intervals is essentially a frequentist perspective, and Bayesian analysts tend to prefer the rather more subjective notion of belief or credible intervals.

Odds ratios

In the earlier topic on probability we introduced the terms odds and odds ratio (OR). With reference to a simple 2x2 table of the form shown below, this enabled us to define four odds proportions: p1=A/(A+B), p2=C/(C+D), q1=B/(A+B), and q2=D/(C+D) and thus the odds ratio (OR) can be written in various ways as:



Not exposed




Not infected



The odds ratio provides an indication (in the range 0 upwards that) of the relative odds of infection (or other factor) of exposed subjects when compared with those not exposed. A value of 2, say, would indicate that exposure was twice as likely to result in infection than non-exposure. It is often convenient to work with the loge of the odds ratio, LOR, as this reduces the expression to a set of additions and subtractions and scales the ratio in a convenient manner. It is also common to compute the ratio and its standard error after adjusting each value by +1/2 (Yates correction).

OR and LOR values are point estimates and provides no information on the range of values this estimate might take. Confidence intervals for the odds ratio can be estimated using the following expression, which utilizes the estimated standard error (SE) of the log odds ratio (where exp() is the exponential function and A, B C and D are not small):

Example: The following data relate to the incidence and possible association of two conditions, breathlessness and wheezing, amongst a large sample of young coal miners:



Not Breathless




Not Wheezing



The odds ratio, OR=(23x1654)/(9x105)=40.26, or the log odds ratio, LOR=ln(OR)=3.7, with a 95% confidence interval defined using a z-value (from the Normal distribution) of 1.96 — this gives a confidence interval for the log odds ratio of +/-0.8, i.e. a range for the log odds from 2.9 to 4.5 or 18 to 89 for the odds ratio itself, a very wide spread. Note that these confidence intervals rely on the asymptotic approximation of the Normal to the exact odds, but in many cases this is an adequate assumption. The significance of a particular logs odds ratio can also be computed using the ratio LOR/SE, which in this instance is 3.7/0.4=9.1, a very large value when compared to the percentage points of the Normal distribution, hence highly significant.

This particular example is based on the test data for the oddsratio() function in the vcd library (visualization of categorical data) in R (see further, R code samples section). The data is included with the library and comprises eight separate age grouped strata, hence 8 2x2 tables of the kind shown above (9 sets should have been included but the 20-24 age group was omitted by the vcd team in the dataset below — it is now included). The current version of the vcd library forms the supporting material for the textbook on "Discrete Data Analysis with R" by Michael Friendly and David Meyer (FRI1). Computations are performed using the LOR and with Yates corrections applied in each case. The results of the confidence interval analysis of each of the strata with a simple quadratic fitted line shown.


The homogeneity of y=LOR across the strata can be tested using a simple chi-square test devised by Woolf (1955, [WOL1]) in order to evaluate the incidence of different diseases (cancers, ulcers) for patients with blood groups A and O in three English cities. This test is provided in the vcd library, which in the case of the above coal miners data yields a chi-square value of 25.6 with 7 Degrees of freedom (df), which is highly significant (and is evident from the visualization above). Woolf's test for the homogeneity of s strata is as follows:

where w=1/SE.


[FRI1] Friendly M, Meyer B (2015) Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data, Chapman-Hall/CRC Press,

[WOL1] Woolf B (1955) On estimating the relation between blood group and disease. Ann. Human Genetics, 19, 251-253