<< Click to Display Table of Contents >>
## Types of error |

In the context of statistical hypothesis testing the expression type of error refers specifically to two main types of error that can occur: false negatives and false positives. Both types of error can often be reduced by increasing the sample size, but this typically involves additional cost and/or time and may not be possible for other practical reasons. A false negative means that we reject a hypothesis that we are testing, such as the mean of our data equals 2, when in fact it is true (also known as a Type I error — see further, below). If we say that we seek a 95% confidence that our conclusions based on sample data will not result in a false negative, we are also saying that one sample in 20 is likely to yield a result that we take as significant, i.e. will result in us rejecting the true value incorrectly. Setting a higher level, such as 99%, reduces the risk of this type of error but may considerably increase the spread or confidence interval, and impact the sample size required.

A false positive result means that we accept our hypothesis (or fail to reject it) when it is in fact false (also known as a Type II error). For example we might observe a mean value from a sample of 2.5, and on the basis of our computations accept the hypothesis (or not reject the hypothesis) that the true mean is 2, whereas it is possible that the true mean is 2.8.

In medical research, similar statements would be: patient X is diagnosed as having a particular illness when in fact they do not (false positive diagnosis); and patient X is not diagnosed as having a particular illness when in fact they do (false negative diagnosis). As we illustrated in the discussion on Bayesian analysis of mammography testing, even if false positives represent a relatively small proportion of test results, the large number of people tested can mean that these equate to a substantial number of individuals and can considerably exceed the number of true positives observed. More formally, in statistical analysis we say:

Type I errors occur when a false negative result is obtained in terms of the null hypothesis by obtaining a false positive measurement i.e. we reject H0 when in fact it is true. This type of error is often denoted with the Greek letter α

Type II errors occur when a false positive result is obtained in terms of the null hypothesis by obtaining a false negative measurement i.e. we accept H0 when in fact it is false. This type of error is often denoted with the Greek letter β

This terminology was introduced by J Neyman and E Pearson in the 1920s and 1930s and is a form of decision-making procedure that seeks to compare one hypothesis (the Null Hypothesis, or H0) against an alternative hypothesis (HA), based on prior determination of the levels α and β. These levels are regarded as the limiting frequencies based on (theoretical or actual) repeated trials (and thus reflect a frequentist approach). The levels α and β are defined in advance of the trial, and are simply measures of the probable error associated with the decision process. The value 1-β is generally called the power of the test. A more powerful test is thus less likely to result in a Type II error.

The Neyman-Pearson approach contrasts with Fisher's approach, in which an experiment is conducted on the assumption that the data observed (the sample) is drawn from an infinite population with a known sampling distribution. A Null Hypothesis (H0) is defined by the researcher in this case, but no alternative hypothesis is defined, and when a probability value, p, is computed (i.e. is an outcome of the process), it is not the regarded as representing the result of repeated sampling but as a calculated result, providing evidence which the researcher may use to decide whether to reject or not reject H0. In general, a p-value of 0.05, which is the level suggested by Fisher and most often used, is in fact quite weak evidence that the Null Hypothesis should be rejected. Hubbard and Bayarri (2003 [HUB1) have produced a table contrasting Fisher's p-levels with Neyman-Pearson's α-levels, with which they are often confused. This is shown below:

Contrasting p's and α's

p-value |
α-value |
---|---|

Fisherian significance level |
Neyman-Pearson significance level |

Significance test |
Hypothesis test |

Evidence against H0 |
Type I error — erroneous rejection of H0 |

Inductive philosophy — from the particular to the general |
Deductive philosophy — from the general to the particular |

Inductive inference — guidelines for interpreting strength of evidence in data |
Inductive behavior — guidelines for making decisions based on data |

Data based random variable |
Pre-assigned fixed value |

Property of data |
Property of test |

Short-run — applies to any single experiment or study |
Long run — applies to ongoing repetitions of original study — not to any given study |

Hypothetical infinite population |
Clearly defined population |

References

[HUB1] Hubbard R, Bayarri M J (2003) P Values are not Error Probabilities. Duke Univ. Working Paper 03-26

Wikipedia: Type I and Type II errors. https://en.wikipedia.org/wiki/Type_I_and_type_II_errors