CAR models

Navigation:  Regression and smoothing > Spatial series and spatial autoregression >

CAR models

Previous pageReturn to chapter overviewNext page

A somewhat different conceptual model, which in practice may produce similar results to SAR, is known as conditional autoregressive modeling (CAR). The essential idea here is that the probability of values estimated at any given location are conditional on the level of neighboring values. The standard or ‘proper’ CAR model for the expectation of a specific observation, yi, is of the form:

where μi is the expected value at i, and ρ is a spatial autocorrelation parameter that determines the size and nature (positive or negative) of the spatial neighborhood effect. The summation term in this expression is simply the weighted sum of the mean adjusted values at all other locations j - this may or may not be a reasonable assumption for a particular problem under consideration.

In the standard CAR model spatial weights are often computed using some form of distance decay function. The range of this function may be unbounded or set to a value beyond which the weights are taken as 0. This range might be determined from somea priori knowledge relating to the problem at hand, or perhaps estimated from a semivariogram or correlogram. In the CAR model the covariance matrix is of the form:

and if the conditional variances of y are assumed constant this simplifies to:

Requirements on the specification of the weighting matrix, W, and conditional variance matrix, M, include:

(i) M is an n by n diagonal matrix with mii>0;

(ii) to ensure symmetry of the variance-covariance matrix wijmji=wjimij; and

(iii) 0>ρ>ρmax (typically) where ρmax is determined from the largest eigenvalues of M‑1/2WM1/2.

In the study by Lichstein et al. (2002, [LIC1]), cited earlier, they chose to use CAR following the recommendation of Cressie (1993, [CRE1]) and because they felt it to be more appropriate for their study. They found no real difference in the results obtained with the CAR model from those achieved using SAR modeling of the type described in the previous section. Wall (2004, [WAL1]) likewise found little difference between CAR and SAR models in her analysis of educational data across USA states.

A range of CAR models are supported by the GeoBUGS extension to the WinBUGS package. This software is specifically designed to support Bayesian rather than frequentist statistical modeling, and uses computationally intensive techniques (Markov Chain Monte Carlo or MCMC simulation with Gibbs sampling) to obtain the fitted parameter estimates and confidence intervals. An ArcGIS tool (Adjacency for WinBUGS) is available from the USGS to generate the spatial adjacency matrix required for WinBUGS CAR models. The application in this case is for modeling and mapping avian abundance, especially for migratory bird species whose conservation is of concern.

Haining (2003, [HAI1]) discusses the use of such Bayesian models, in which additional (prior) information (for example, national or regional crime survey data) is used to strengthen the modeling process and reduce bias in local estimates. The Bayesian approach treats the unknown parameters (e.g. the vector β) as a set of random variables, just like the data, to which may be associated prior distributions. The prior guesses for these parameters (possibly ‘borrowed’ from spatially adjacent and/or regional or national information) are then combined with the likelihood of the observed data to obtain posterior distributions for the parameters, from which inferential analysis proceeds. Essentially this provides a broader range of modeling approaches than pure (classical) frequentist analysis, and has been shown to result in substantial improvements over using simple rate data such as standard mortality ratios (SMRs); see for example, Yasui et al. (2000, [YAS1]) for a fuller discussion of this question. These kinds of model have mostly been applied in epidemiological studies, for both mapping and modeling purposes, but have also been applied to other forms of spatial data (e.g. see Li et al., 2007, [LI1]).

In the so-called proper CAR model (WinBUGS function car.proper) the variance-covariance matrix is positive definite. The example values given in the WinBUGS manual for M and W based on expected counts, Ei, are of the form:

mii=1/Ei

wij=(Ej/Ei)1/2 for neighboring areas i,j or

wij=0 otherwise

This particular example relates to the Sudden Infant Death Syndrome (SIDS) data described in Cressie and Chan (1989, [CRE2]) and more recently revisited by Berke (2004, [BER1]). Here the definition of neighboring area was not based on adjacency but on distance between county seats (d<30 miles), a value determined from an examination of an experimental variogram (an estimate of a variogram based on sample data). The specific model applied in this case was actually of the form:

where the term in curly brackets is a (Euclidean) distance decay function, with k selected as 0, 1 or 2, and C(k) is a constant of proportionality to ensure results are easily compared across different values of k. In this study the authors chose k=1 as this provided the best results when considered from a likelihood perspective, hence their weights were of the form:

Edge effects in this model are quite significant, since over a third of counties lie on the State boundary and clearly States do not represent closed systems for many (most) applications.

In this example the ‘proper’ (or autoGaussian) model fitted for this dataset was not applied to the full raw dataset, but to a Freeman-Tukey variance-stabilizing square root transform of the data with Anson County omitted as an outlier.

Cressie and Chan had looked for non-spatial explanatory variables based on population density, percentage urban, number of hospital beds per 100,000 population, median family income and non-white live-birth rate. They then extended their analysis to include spatial patterns, but even after doing so could not adequately explain the observed variations in the data for this period, or for the subsequent 5 year period. It remains the case that the causes of SIDS are not fully understood, but medical research has shown that the placement of very young children on their back when sleeping, the use of pacifiers (dummies) and avoidance of overheating, all help to reduce the risks involved substantially. It is reasonable to suggest that the spatial variations observed and their changes over time might have been, in part, a reflection of cultural and social factors (such as advice given to mothers by local medical staff). These factors were not explicitly picked up by the non-spatial explanatory variables. Although such factors may be related to race-specific customs, it is likely that the spatial variations observed and modeled may have reflected variations in these advisory and behavioral factors. Certainly it would have warranted a very close examination of such factors in counties with unusually high and low death rates in each time period.

An intrinsic version of the CAR model (IAR or ICAR) is also supported, in which the variance-covariance matrix is not positive definite, but is semi-definite (WinBUGS functions car.normal and the robust variant car.l1). The intrinsic version (applied initially in an image processing context) is based on pairwise differences between the observed values, similar to the computations used in variogram analysis, from which it originates — see Matheron (1973, [MAT1], for a detailed mathematical treatment) and is now a more popular choice of CAR model for many researchers. Intrinsic models are a generalization of the standard conditional autoregressive models to support certain types of non-stationarity. The example values given in the WinBUGS manual for M and W for the intrinsic CAR model, based on Besag et al., (1991, [BES1]) and Besag and Kooperberg (1995, [BES2]) are of the form:

mii=1/ni

where ni is the number of areas adjacent to i, and wij=1 for neighboring areas or wij=0 otherwise. The use of simple 1/0 weighting schemes for CAR models is not really appropriate for finite irregular lattices, and frequently a row-adjusted scheme of the form W*={wij*} is used, where wij*=wij/wi. (often written within this field as wij/wi+). Hence the expected conditional means, for example, refer to an average rather than a summation. The symmetry requirement for CAR models cited earlier, i.e. wijmji=wjimij implies that the conditional variances should be proportional to 1/wi+.

Having fitted the chosen model to the sample data, the residuals may be examined by mapping and/or by using the Moran I correlogram, I(h), to identify any remaining patterns. If the residuals appear to show little or no spatial pattern it supports the view that the fitted model provides a good representation of the observed spatial patterns. However, as noted earlier, different models with fundamentally different interpretations may provide equally good fits to the data, hence drawing inferences from such models is difficult. Detailed examination of the likely processes that apply for the particular dataset under consideration are vital for such analyses.

In the examples cited in this subsection, the response variable, y, has been assumed to be continuous. As with GWR, autoregressive models have been developed to handle discrete and binary data, for example autoLogistic and autoPoisson models — see Haining (2003, Chapters 9 and 10, [HAI1]) for more details. Haining (2003, p.367 et seq, [HAI1]) provides examples of the use of WinBUGS for Bayesian autoregressive modeling of burglaries in Sheffield, UK, by ward (Binomial logistic model) and children excluded from school (Poisson model). He includes sample code and data for these examples, together with maps of the results and provisional interpretations.

Griffith (2005, [GRI1]) tested six alternative spatial regression models using data on cases of West Nile Virus (WNV) in the USA, by State (for current mapping of cases see: http://diseasemaps.usgs.gov). These tests included SAR, CAR and spatial filtering specifications. He concluded that none of the models provided an ideal specification for the observed data, but a number of general lessons could be learnt to assist analysts in determining the best approach to use in disease map modeling:

switching between alternative model specifications should yield similar intercept values − if markedly different values are obtained an analyst should be suspicious and ascertain why
non-Normal data (such as the WNV cases) are best described with non-Normal probability models
it appears from the tests on WNV data that spatial filtering models can be used to explore the nature of spatial autocorrelation effects (positive and negative effects) more quickly and effectively than SAR and CAR models

References

[BER1] Berke O (2004) Exploratory disease mapping: kriging the spatial risk function from regional count data. Intl. J of Health Geographics, 3(18), 1-11 (available from Biomed Central: www.ij-healthgeographics.com)

[BES1] Besag J, Kooperberg C L (1995) On conditional and intrinsic autoregressions. Biometrika, 82, 733-46

[BES2] Besag J, York J, Mollie A (1991) Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Maths, 43, 1-59

[CRE1] Cressie N A C (1991, 1993) Statistics for spatial data. John Wiley, New York (Revised edition 1993)

[CRE2] Cressie N A C, Chan N H (1989) Spatial modeling of regional variables. J Amer. Stat. Assoc., 84, 393-401

[GEL1] Gelfand A E, Diggle P J, Fuentes M, Guttorp P eds. (2010) Handbook of Spatial Statistics. Chapman Hall/CRC Press, Boca Raton, Florida

[GRI1] Griffith D A (2005) A comparison of six analytical disease mapping techniques as applied to West Nile Virus in the coterminous United States, Intl. J of Health Geographics, 4:18. Available from: http://www.ij‑healthgeographics.com/content/4/1/18/

[HAI1] Haining R (2003) Spatial data analysis — theory and practice. Cambridge University Press, Cambridge, UK

[LI1] Li L, Zhu L, Sui D Z (2007) A GIS-based Bayesian approach for analyzing spatio-temporal patterns of intra-city motor vehicle crashs. J of Transport Geog., 15, 274-285

[LIC1] Lichstein J W, Simons T R, Shriner S A, Franzreb K E (2002) Spatial autocorrelation and autoregressive models in Ecology. Ecological Monographs, 72, 445-63

[MAT1] Matheron G (1973) The intrinsic random functions and their application. Advances in Applied Prob., 5, 439-68

[WAL1] Wall M M (2004) A close look at the spatial structure implied by the CAR and SAR models. J of Statistical Planning and Inference, 121, 311-24

[YAS1] Yasui Y, Liu H, Beach J, Winget M (2000) An empirical evaluation of various priors in the empirical Bayes estimation of small area disease risks. Statistics in Medicine, 19, 2409-20