Let {x} be a sample of n observations of a continuous variable with mean, m, and standard deviation, s. The sample is assumed to randomly selected from a population that is Normally distributed or reasonably Normal. This is the basic t-test that is used to determine whether the sample mean, m, which is the best estimate of the population mean, has a true value a, when the population standard deviation, σ, is not known (use a z-test if the population standard deviation is known or a very good estimate for it is available). Compute the statistic:

(where SE is the standard error) then t has a t-distribution and can be compared to the percentage points of the t-distribution in order to obtain an estimate of the probably of observing a standardized difference as large as that observed. Typically a two-tailed test is used and the test is to see whether:

where α/2=the probability level chosen (typically α=0.05) and n-1 is the number of degrees of freedom. The null hypothesis is that H0: μ=a so the hypothesis is rejected if the observed difference is greater than the t-distribution value.

Example: Industrial device test

To illustrate this procedure we use the example of 10 measurements of the diameter of sampled devices from a batch produced in a manufacturing process. The 10 values, in mm are shown below:

1.023,1.002,1.039,1.053,0.981,0.980,1.011,0.968,0.994,1.159

The question asked is whether these measurements are significantly different from the design specification of 1.000mm. The null hypothesis is H0: μ=1, with the alternative being HA: μ≠1. From the dataset we have a=1.000, m=1.021, SE=0.01754, giving t=1.1975. We now compare this value with a t-distribution with 9 degrees of freedom. Using a software facility such as the R function t.test() or looking up the observed t-value in tables we find the probability or p-value = 0.2617. It is therefore reasonable to assume that the sample is within the target design specification. However, the 95% confidence interval for this data is: [0.98133, 1.06067] so if the design specification also specified a range, of say [0.95,1.05] the sample would still fail. The confidence intervals are determined from:

where k is the standard error, as above, i.e. 0.01754. The t-distribution for (1-α)=95% limits is 2.262157, so the confidence interval is determined from the sample mean 1.021+/- (0.01754)x(2.262157).

For both mean value and confidence interval estimation the sample size required to discern a pre-specified difference in the mean or a specific confidence interval can be estimated using formulas or charts designed for the purpose. Special graphs, known as Operating Characteristic curves, provide plots of the relationship between sample size and the two main types of error (e.g. see Ferris et. al., 1946, for a number of such charts covering χ2, F, z-tests and t-tests, [FER1]). In the case of the mean value an estimate of the standard deviation is required (for example, from earlier studies), and the standardized measure or effect, E, of the form E=(target difference)/(estimated standard deviation) is computed. The risk of failing to accept that μ=a when it really is a, is the α-level, whilst the risk of accepting that μ=a when really μ=b is the β-level. By specifying E, α and β (or the power, which is 1-β) an estimate of the sample size required can be determined (see Sampling and Sample Size section).

The single mean t-test can be used, effectively unaltered, for comparing two means (see further below) where the means are drawn from paired observations. For example, comparing the output of two identical machines or processes operating in parallel, and each outputting an item that can be paired with the output from the other machine.

References

[FER1] Ferris C D, Grubbs F E, Weaver C L (1946) Operating Characteristics for the Common Statistical Tests of Significance. Annals of Mathematical Stats, 17(2), 178-197