Nonlinear regression is the term used to describe regression models that are nonlinear in the function coefficients. In linear regression the general form of the model used is:
and the least squares solution for the coefficients is obtained from the matrix expression:
whereas in nonlinear regression the standard model is of the form:
where f(,) is some nonlinear function of the parameters, β, and X which is the design matrix determined by the predictors.
Examples of simple nonlinear models are:
Although in principle some nonlinear models (e.g. a simple exponential model) can be linearized (e.g. by taking a log transform) this is not recommended. Fitting of the model selected to the data is usually an iterative numerical optimization process based on a nonlinear version of ordinary least squares (nonlinear least squares, or nls). In general nls requires the modeler to provide initial estimates (starting values) for the unknown parameters. Some packages for certain model types provide socalled selfstarting functions, that attempt to estimate the best starting values from the data. For example the R function SSasympOff performs this operation for asymptotic regression models with an offset (there are many others provided in R). Likewise, in some geostatistical packages the fit of a selected semivariogram model may be automated. Another approach is interactive modeling followed by optimization, in which both the choice of model and the determination of initial parameters is supported through interactive graphing of the data and selected model (e.g. as in interactive variography).
Example: Retail sales and advertising expenditure
This example is drawn from the SPSS case study set, which provides an illustration of nonlinear regression as applied to retailing. The data, shown below, indicate the level of (detrended) sales achieved (col 2) with varying levels of advertising spend (col 1).
4.69 12.23
6.41 11.84
5.47 12.25
3.43 11.1
4.39 10.97
2.15 8.75
1.54 7.75
2.67 10.5
1.24 6.71
1.77 7.6
4.46 12.46
1.83 8.47
5.15 12.27
5.25 12.57
1.72 8.87
3.04 11.15
4.92 11.86
4.85 11.07
3.13 10.38
2.29 8.71
4.9 12.07
5.75 12.74
3.61 9.82
4.62 11.51
By the 'law' of diminishing returns one would expect the volume of sales to level off no matter how much money was spent on advertising, so a model that has an upper asymptote would appear appropriate. In geostatistical modeling this upper level is known as the sill, and can be regarded as one parameter of the model. A second parameter is the amount of advertising spend needed to reach this sill. Again, this has a geostatistical equivalent, and is known as the range  the value beyond which no further meaningful change is observed. Visual inspection of the data may enable starting values for such parameters to be estimated. Depending on the model selected, additional parameters may be included, for example the rate of change in sales as advertising expenditure is increased. In the current example, a 3parameter exponential model of the type described earlier is selected and the data graphed in order to provide initial estimates for the parameters. The model in descriptive terms is of the form:
From the scatterplot below it would appear that the sill for this data is roughly 13, the range is around 6 or 7, and the rate of increase across the range is a little over 1, so these values may be used as the initial values. Note that the diagram below has the dependent variable, detrended sales, commencing at greater than zero and only has data for a specific range of advertising expenditure. The final model, assuming it provides an excellent fit to the data, will only be 'safe' to use within this range, or very close to the limits of this range.
The nonlinear least squares procedure is then run, possibly after applying constraints to the permitted values that the parameters can take (all must be greater than zero in the model selected. In the current example, the procedure converges rapidly (15 iterations) and produces the following parameter estimates (in the form applicable to the expression we have specified):
Parameter Estimates 


Parameter 
Estimate 
Std. Error 
95% Confidence Interval 

Lower Bound 
Upper Bound 

b1 (sill) 
12.904 
0.610 
11.636 
14.173 
b2 (range) 
11.268 
1.581 
14.556 
7.979 
b3 (rate) 
0.496 
0.138 
0.782 
0.209 
The sill parameter has a relatively narrow confidence interval, but the range and rate intervals are wider relative to their estimated values, so one has lower confidence in these. However, the analysis of variance for this model shows that it does account for over 90% of the variation in the data, and a combined plot of the observed and predicted data confirms the quality of the fit:
Analysis of the same data in R using the nls() function converges in 5 iterations and yields similar (but not identical) values for the parameters. The selfstarting R function, SSAsymp() could also have been used in this case.
This example uses a parametric approach in which the model is either known in advance (e.g. from theoretical considerations) or the model form is chosen from a welldefined set of suitable nonlinear expressions. A variety of other statistical techniques may be used to provide nonlinear regression models, including smoothing and GAM functions, regression trees and mixed effects models.
References
[BAT1] Bates D M, Watts D G (1988) Nonlinear Regression Analysis and its Applications. John Wiley and Sons, New York