﻿ Contingency tables > Fisher's exact test

# Fisher's exact test

This test is typically applied to count data where the cross-tabulation is of dimensions 2x2 and the total sample size is small. The computations utilize the hypergeometric distribution. In our description of the test we follow Pearson and Hartley (1954, p65 et seq [PEA1]). We start with the general form of a 2x2 table, which is written either in the form:

 Y Total With Y Without Y Group 1 a b a+b Group 2 c d c+d Total r N-r N

or

 Y Total With Y Without Y Group 1 a A-a A Group 2 b B-b B Total r N-r N

In this table we have a group of A+B=N individuals, with r possessing a certain characteristic, Y, and N-r not having this characteristic. If a sample of A individuals is drawn from this group of N without replacement then the probability that a of these possess the characteristic Y is obtained from the hypergeometric distribution: In this case we have Example: Impact of training on performance

This example shows the results of trials of two training regimes given to newly recruited operatives in an industrial process. In one group preliminary training on the use of the equipment was given, whilst in the other no preliminary training was provided. The outcomes, in terms of the correct operation of the equipment was recorded, as shown in the table below:

 Y: Performance Total Faulty Correct 1: No training 9=a 6 15=A 2: Training 3=b 11 14=B Total 12 17 29=N

A single tail test is appropriate here and using the R function fisher.test(x,alternative="greater") where x is the matrix of cell values, we find the probability p= 0.04075 indicating that the effect of training is significant at the 5% level.

Note that this result is the probability for the single arrangement shown. Assuming a>0 and is the smallest value in the table, the possible values a can take are 0,1,2…, a, i.e. a+1 possibilities less than or equal to a. To evaluate the probability that a value as small as, or smaller than a is observed, the separate probabilities for each arrangement must be calculated, so the required probability is the sum of these individual probabilities. Note also that 2x2 type data tables of this type are often referred to as representing dichotomous data (e.g. alive/dead, infected/not infected). Data which has multiple measurements on an ordinal or interval scale are generally described as continuous (e.g. blood pressure, weight, temperature).

We have noted above that the ‘exact’ model assumes the marginal totals are given, i.e. are part of a design and not themselves subject to random sampling. This assumption is often not met, raising the question as to what effect this has on the probability estimates. One possible solution to this difficulty is to compute all possible 2x2 partitions of N and identify the proportion of those for which a value less than or equal to a is observed. In practice this is computationally infeasible, but can readily be approximated by Monte Carlo simulation. One way to proceed based on a given sample size, N but unspecified marginal values, would be as follows:

1.Select a random number in the range 0…N. Make this a

2.Select a random number in the range 0….N-A. Make this A-a. This determines A and hence B (since B=N-A)

3.Select a random number in the range 0…B. Make this b. This determines the remaining table entries, including marginals

4.If any cell entry is less than or equal to a increment a counter, k

5.Repeat steps 1 to 4 for a large number of times, e.g. n=100,000

6.The probability of any cell having a value less than or equal to a is then estimated as k/n

Alternative models can readily be defined, for example by pre-defining either the row or column marginals, or both, or by specifying that the less than or equal to criterion applies to particular cells.  Typically this one-sided test view is adopted, but two-sided tests may be equally relevant.

References

[BRE1] Breslow N E, Day N E (1980) Statistical Methods in Cancer Research: Volume 1 — The Analysis of Case-Control studies. IARC Scientific Publications No.32, World Health Organization, IARC  Lyon

[BRE2] Breslow N E, Day N E (1987) Statistical Methods in Cancer Research: Volume 2 — The Design and Analysis of Cohort Studies. IARC Scientific Publications No.82, World Health Organization, IARC Lyon

[PEA1] Pearson E S, Hartley H O eds. (1954) Biometrika Tables for Statisticians. 4th edition. Vol. 1, Cambridge University Press, Cambridge, UK