Generalized additive models (GAM) are an extension to generalized linear models (GLIM) in which one or more of the continuous predictor variables are modeled using non-parametric smoothing functions (see Hastie and Tibshirani, 1990 [HAS1] for a full discussion of GAM). Typically low order thin plate splines provide the smoothing functions, and these are the default used in the R library mgcv package authored by Simon Wood — see mgcv documentation and Wood (2006, [WOO1]) for full details. Other smoothing functions are possible, with LOESS functions (locally weighted scatterplot smoothing) being a popular alternative. Thus if a given variable enters a glm() function as x, say, then with gam() it could enter either as x or as the smoothed function s(x). Simple R expressions with a mix of standard and smoothed variables are typically be of the form:
y ~ x1+x2+s(x3)+s(x4,x5)
where x1 to x5 are the predictor variables and s() is the default smoothing function. In SAS/STAT this type of modeling is implemented via the PROC GAM facility.
Example: Engine fuel tests
Crawley (2007, ch.18, [CRA1]) gives a simple example in R using the data representing 88 measurements obtained from a set of engine fuel tests. The single response variable is the concentration of Nitric oxide and Nitrogen dioxide in the exhaust emissions, y, with predictor variables being the compression ratio of the engine (C) and a measure of the richness of the mixture, E). Because the response variable appears to be a strongly humped function of the richness, a simple smoothing model was used of the form:
The resulting model fits well (but can be improved, as discussed in Crawley's analysis of this data) with >95% of the deviance explained by the model as it stands. A graph showing the data and the model fit is shown below: