Geographically weighted regression (GWR)

<< Click to Display Table of Contents >>

Navigation:  Regression and smoothing >

Geographically weighted regression (GWR)

GWR is the term introduced by Fotheringham, Charlton and Brunsdon (2002, [FOT1]) to describe a family of regression models in which the coefficients, β, are allowed to vary spatially. GWR uses the coordinates of each sample point or zone centroid, ti, as a target point for a form of spatially weighted least squares regression (for some models the target points can be separately defined as grid intersection points, rather than observed data points). The result is a model of the form:

The coefficients β(t) are determined by examining the set of points within a well-defined neighborhood of each of the sample points. This neighborhood is essentially a circle, radius r, around each data point. However, if r is treated as a fixed value in which all points are regarded as of equal importance it could include every point (for r large) or alternatively no other points (for r very small). Instead of using a fixed value for r it is replaced by a distance-decay function, f(d). This function may be finite or infinite, much as with kernel density estimation. The functions utilized in the GWR software package are of the form:

In these functions the parameter, h, also known as the bandwidth, is the key factor determining the way in which the weighting schemes operate. A small bandwidth results in very rapid distance decay, whereas a larger value will result in a smoother weighting scheme. This parameter may be defined manually or alternatively by some form of adaptive method such as cross-validation minimization, e.g. jackknifing (see further Efron, 1982, [EFR1] and Efron and Tibshirani, 1997 [EFR2]), or minimization of the Akaike Information Criterion (AIC). The use of a kernel function also raises the possibility of generating additional descriptive statistics, as have been described earlier. Using a selected kernel function and bandwidth, h, a diagonal weighting matrix, W(t), may be defined for every sample point, t, with off-diagonal elements being 0. The parameters β(t) for this point can then be determined using the standard solution for weighted least squares regression:

If we let:


The standard errors of the parameter estimates can be computed as the square root of these variances and used in t-tests to obtain estimates of the significance of the individual components. In this model the variance component, σ2, is defined by the normalized residual sum of squares (RSS) divided by the degrees of freedom. The latter are defined by the number of parameters, p, in a global model, or the effective number of parameters in the GWR model. This value is approximated by the authors as the trace of a matrix S, tr(S) (the sum of the diagonal elements of S) defined by the relation:

A set of such equations is solved for all points, t. The fit of the model may be examined in the usual manner, although it is to be expected that the fit in terms of variance explanation will almost always be an improvement over global methods, if only because there are far more parameters fitted to the dataset. For this reason comparisons should be made on additional criteria, for example the AIC measure which takes account of the model complexity. As with conventional regression, the modeled surface and (standardized) residuals may be mapped for exploratory purposes, but additionally the parameters β(t) and their estimated standard errors may also be mapped since these also vary spatially. Within GWR standardized residuals are determined as the sum of squared residuals, εεT, divided by the degrees of freedom, n‑tr(S). The authors recommend examining any values for these residuals >3.

Example: Georgia State educational achievement

To illustrate this process we shall use an example dataset comprising educational attainment by county in the state of Georgia, USA (the dependent or response variable, see table below). This dataset lists the percentage of University graduates by county together with a range of social data that might act as independent variables to be used in predicting the dependent variable. The data have been assigned to a set of 159 point locations (county centroids) and show an overall average of 19% of the population recorded as being graduates, with an average per county of 10.9% (i.e. not population weighted). The range by county is from 4.2% to 37.5%. The table shows the predictor variables and the global regression parameters estimated by OLS, which collectively account for around 63% of the variance. Also shown are the GWR parameter estimates, expressed as a range of values that have been computed. In the diagnostics section of this table note the drop in residual sum of squares, the increase in the adjusted R2, and a modest fall in the AICc statistic. The authors suggest that a fall of 3 or more in the AICc value warrants examination as demonstrating a meaningful improvement in model fit. Note that differences in the method of calculation of the AIC statistic can easily result in differences of greater than 3, so caution is required when comparing alternative software packages on this measure. As mentioned above, the standardized residuals from the GWR predictions can be mapped in order to identify any prediction outliers. The diagram that follows (A) shows this mapped dataset, with the dark blue and red counties highlighted in the upper section of the map being those with the highest and lowest deviations. These counties may then be examined to try and ascertain if there are any special characteristics of these cases that might explain the large residuals. By definition the GWR modeled parameters falling within the range shown in the table below include a value for every county. Hence each parameter can also be mapped, as shown in diagram B. In this case the map highlights a distinct pattern of variation, with higher values in the north and lower values in the south.

Georgia dataset — global regression estimates and diagnostics

Predictor variables

Global parameter estimate

GWR parameter estimates

Total population, β1

0.24 x10‑4

0.14 to 0.28 x10‑4

% rural, β2


‑0.06 to ‑0.03

% elderly, β3

‑0.06 (not signif.)

‑0.26 to ‑0.06

% foreign born, β4


0.51 to 2.42

% poverty, β5


‑0.20 to –0.00

% black, β6

0.022 (not signif.)

‑0.04 to 0.08

Intercept, β0


12.62 to 16.49




Residual SS



Adjusted R2



Effective parameters






Georgia educational attainment: GWR residuals map, Gaussian adaptive kernel

A. Standardized residuals

B. Parameter 5: % foreign born, β4

As Fotheringham et al. (2005, [FOT2]) have noted: “In some instances … it is difficult to justify why some relationships should be allowed to vary spatially. In others, empirical results may suggest that some relationships are stationary over space while others vary significantly. In these instances, ‘mixed’ GWR models, where some relationships are allowed to vary spatially while others are held constant, would seem to be more appropriate.

The same kind of GWR analysis can be carried out on count data, using Poisson regression (GWPR), and on binary data, using Logistic regression (GWLR). The GWR program supports both models. The standard Poisson and Logistic regression models are utilized, but with the coefficients β(t) varying with location, t, as before. As an example, GWPR has been applied to counts of disease incidence amongst a particular age/sex grouping, recorded by health district. In such models an offset value is applied to the model based on a matching count variable, such as the total number of people in that district in the selected age/sex cohort. For example, Nakaya et al. (2005, [NAK1]) applied GWPR to mortality rates in Tokyo. The dependent variable in this instance was based on the standardized mortality ratio (SMR) for each of 262 municipality zones. The SMR is defined as the observed number of deaths, Oi, in a specified time period (e.g. 1990) in a given zone i, divided by the expected number, Ei, for that zone based on national or regional mortality rates (i.e. by demographic grouping). In the GWPR model the Oi became the response variable and the Ei provided the offset values. The study was able to examine relationships between the dependent variable and a number of independent variables at the local level, highlighting particular relationships that global models may not have identified. In fact the best model the authors were able to produce included a mix of global and local parameter estimates, with the proportion of older people (64+) and of house-owners taken as globals and the proportion of professional and technical people and the proportion of unemployed people being allowed to vary regionally.

For Logistic GWR one might have true presence/absence data, or recoded continuous data based on some critical threshold value. For example, with the Georgia dataset the dependent variable could be recoded as 1 if the percentage of graduates is above the state average and 0 otherwise. This is rather an artificial example, but recoding of this type is often applied in decision-making — for example coding land as contaminated (1) if, say, the average measured cadmium level in the soil exceeds a certain number of parts per million (ppm) and not contaminated (0) with respect to this trace element if below this threshold level.

Both Poisson and Logistic GWR require model fitting using a technique known as iteratively reweighted least squares (IRLS). The analysis is carried out in much the same manner as previously described, but the computation of the Akaike Information Criterion (AIC) and AICc differs from the OLS expressions.

The ready availability of GWR software, supporting Gaussian, Poisson and Logistic models, together with a companion book and materials, has resulted in an upsurge in interest in the technique. This includes its consideration by spatial econometricians, medical statisticians and ecologists amongst others (GWR support has recently been included in the R-spatial collection as spgwr and in the SAM ecology package, as well as in the latest versions of ESRI’s ArcGIS). It has the attraction of accepting the non-stationarity of most spatial datasets, and proceeding to create models with improved information characteristics and amenable to further exploratory analysis. For large-scale problems the processing overheads of GWR may become prohibitive, but the technique is well-suited to parallel or grid-enabled processing.


[EFR1] Efron B (1982) The Jackknife, the Bootstrap and other resampling plans. Philadelphia: SIAM

[EFR2] Efron B, Tibshirani R J (1997) Improvements on cross-validation: The .632+ bootstrap method. J. American Stat. Assoc., 92, 548-560

[FOT1] Fotheringham A S, Brunsdon C, Charlton M (2002) Geographically Weighted Regression: The analysis of spatially varying relationships. Wiley, New York

[FOT2] Fotheringham A S, Charlton M, Brunsdon C, Nakaya T (2005) Model selection issues in Geographically Weighted Regression. Proc., Geocomputation 2005 Conference, Ann Arbor, Mi. USA

[NAK1] Nakaya T, Fotheringham A S, Brunsdon C, Charlton M (2005) Geographically weighted Poisson regression for disease association mapping. Statist. in Med., 24, 2695–717