Non-linear regression

<< Click to Display Table of Contents >>

Navigation:  Regression and smoothing >

Non-linear regression

Non-linear regression is the term used to describe regression models that are non-linear in the function coefficients. In linear regression the general form of the model used is:

and the least squares solution for the coefficients is obtained from the matrix expression:

whereas in non-linear regression the standard model is of the form:

where f(,) is some non-linear function of the parameters, β, and X which is the design matrix determined by the predictors.

Examples of simple non-linear models are:

Although in principle some non-linear models (e.g. a simple exponential model) can be linearized (e.g. by taking a log transform) this is not recommended. Fitting of the model selected to the data is usually an iterative numerical optimization process based on a non-linear version of ordinary least squares (non-linear least squares, or nls). In general nls requires the modeler to provide initial estimates (starting values) for the unknown parameters. Some packages for certain model types provide so-called self-starting functions that attempt to estimate the best starting values from the data. For example the R function SSasympOff performs this operation for asymptotic regression models with an offset (there are many others provided in R). Likewise, in some geostatistical packages the fit of a selected semivariogram model may be automated. Another approach is interactive modeling followed by optimization, in which both the choice of model and the determination of initial parameters is supported through interactive graphing of the data and selected model (e.g. as in interactive variography).

Example: Retail sales and advertising expenditure

This example is drawn from the SPSS case study set, which provides an illustration of non-linear regression as applied to retailing. The data, shown below, indicate the level of (de-trended) sales achieved (col 2) with varying levels of advertising spend (col 1).

4.69        12.23

6.41        11.84

5.47        12.25

3.43        11.1

4.39        10.97

2.15        8.75

1.54        7.75

2.67        10.5

1.24        6.71

1.77        7.6

4.46        12.46

1.83        8.47

5.15        12.27

5.25        12.57

1.72        8.87

3.04        11.15

4.92        11.86

4.85        11.07

3.13        10.38

2.29        8.71

4.9        12.07

5.75        12.74

3.61        9.82

4.62        11.51

By the 'law' of diminishing returns one would expect the volume of sales to level off no matter how much money was spent on advertising, so a model that has an upper asymptote would appear appropriate. In geostatistical modeling this upper level is known as the sill, and can be regarded as one parameter of the model. A second parameter is the amount of advertising spend needed to reach this sill. Again, this has a geostatistical equivalent, and is known as the range — the value beyond which no further meaningful change is observed. Visual inspection of the data may enable starting values for such parameters to be estimated. Depending on the model selected, additional parameters may be included, for example the rate of change in sales as advertising expenditure is increased. In the current example, a 3-parameter exponential model of the type described earlier is selected and the data graphed in order to provide initial estimates for the parameters. The model in descriptive terms is of the form:

From the scatterplot below it would appear that the sill for this data is roughly 13, the range is around 6 or 7, and the rate of increase across the range is a little over 1, so these values may be used as the initial values. Note that the diagram below has the dependent variable, de-trended sales, commencing at greater than zero and only has data for a specific range of advertising expenditure. The final model, assuming it provides an excellent fit to the data, will only be 'safe' to use within this range, or very close to the limits of this range.

The non-linear least squares procedure is then run, possibly after applying constraints to the permitted values that the parameters can take (all must be greater than zero in the model selected. In the current example, the procedure converges rapidly (15 iterations) and produces the following parameter estimates (in the form applicable to the expression we have specified):

Parameter Estimates



Std. Error

95% Confidence Interval

Lower Bound

Upper Bound

b1 (sill)





b2 (range)





b3 (rate)





The sill parameter has a relatively narrow confidence interval, but the range and rate intervals are wider relative to their estimated values, so one has lower confidence in these. However, the analysis of variance for this model shows that it does account for over 90% of the variation in the data, and a combined plot of the observed and predicted data confirms the quality of the fit (lighter/green circles are predicted values):

Analysis of the same data in R using the nls() function converges in 5 iterations and yields similar (but not identical) values for the parameters. The self-starting R function, SSAsymp() could also have been used in this case.

This example uses a parametric approach in which the model is either known in advance (e.g. from theoretical considerations) or the model form is chosen from a well-defined set of suitable non-linear expressions. A variety of other statistical techniques may be used to provide non-linear regression models, including smoothing and GAM functions, regression trees and mixed effects models.


[BAT1] Bates D M, Watts D G (1988) Nonlinear Regression Analysis and its Applications. John Wiley and Sons, New York