Test of regression coefficients

<< Click to Display Table of Contents >>

Navigation:  Classical tests > T-tests >

Test of regression coefficients

In simple linear regression we have a dataset of (x,y) pairs and we wish to find a best fit, or regression, line through the set (bearing in mind the issues regarding the appropriateness of such a model, for example as discussed earlier in the section on Anscombe's Quartet). The form of the model used is:

The coefficients are determined by least squares minimization and are:

The first of these estimators is the slope of the regression line, and if the x-values are taken at fixed intervals (for simplicity of discussion) the expression can be seen to be a linear sum of the y-values. If these are Normally distributed about the regression line, which is a central assumption, then the regression slope coefficient is itself Normally distributed with mean value β1 and variance:

We can now produce a statistic that follows a t-distribution in the usual manner:

Typically the null hypothesis is that the slope is zero, i.e. there is no significant linear association between the variables x and y, so the expression is simple the ratio of the estimated slope to its standard deviation, and has n-2 degrees of freedom. The earlier topic on product moment correlation showed that significance of the sample correlation coefficient, r, could also be tested in a manner similar to that for the regression slope:

This result is not surprising since the correlation coefficient and the slope of the regression line in a simple linear regression are closely related by the expression:

This form of test can be directly extended to the comparison of the slopes of two regression lines in the same manner as the basic t-test can be extended from a single mean to the comparison of two mean values. In this instance the statistic to compute is:

where typically we take d=0 and the denominator in this expression is determined from the pooled or common variance. Using the expression for the variance estimate, s2, above, and denoting the two samples as having sample variances s12 and s22 then the best estimate of the common variance is:

and the estimated standard deviation of the difference between the slopes is then:

The t-test statistic can then be computed and has n1+n2-2 degrees of freedom. The t-distribution can also be used to provide confidence intervals and prediction intervals in regression modeling, as discussed in more detail in the regression topic.