Spatial autocorrelation

Navigation:  Correlation and autocorrelation > Autocorrelation >

Spatial autocorrelation

Previous pageReturn to chapter overviewNext page

The procedures adopted for analyzing patterns of spatial autocorrelation depend on the type of data available. There is considerable difference between:

a set of real valued measurements recorded for a 10x10 grid of 100 (100mx100m) squares that comprise a 1km x 1km square (1000mx1000m region); and
a set of real valued measurements obtained for 100 contiguous but arbitrarily shaped polygons which again cover the same region; and
a set of nominal values (classes) for 100 contiguous but arbitrarily shaped polygons which again cover the same region; and
a set of data values obtained from 100 arbitrarily distributed sample points in our study region

Each case warrants a slightly different approach, but each utilizes the notion of proximity bands as a means of imposing some form of serial behavior to the data. The idea is then to examine the correlation between areas or points at given levels of separation, to obtain a similar measure to that used within time series analysis. Initially we examine the case of nominal-valued data, which helps identify the key concepts involved and focuses our attention on a set of questions regarding such analysis that we then examine in more detail.

Join counts and the analysis of nominal-valued spatial data

Consider a regular grid of cells that completely covers a sampled region with a value associated with each grid square. The set of values in this case could be binary, e.g. presence/absence; or they could be classified into one of k classes (see Dacey, 1968 [DAC1], for a full discussion of this case); or k classes could be re-classified back into two classes, one class from k and the remaining k-1 classes. Typically this latter kind of data is nominal-valued, i.e. the cell or zone values can be regarded as categories, such as woodland, grassland, sandy desert etc. rather than ordered values.

If we assume the data are binary, perhaps representing the presence or absence of a particular species of insect or plant variety in sample quadrats, a range of possible patterns might be observed (see examples A-D in the diagram below). In "A" all the observed values for presence are in one half of the 6x6 grid (strong positive autocorrelation), whilst in "B" they are perfectly evenly distributed (strong negative autocorrelation). Example "C" gives a particular case of a random pattern (of which there are many). In each case 50% of the cells show presence, but this value could easily be 10% or any other value depending on the data being studied. Finally, "D" provides an example of a real-world dataset showing the presence or absence of desert holly in a 16x16 cell sample region. Note that in each instance our comments refer to autocorrelation at the scale of the cell, i.e. between cells; within cells autocorrelation is typically strongly positive.

Join count patterns

A. Completely separated pattern (+ve)

B. Evenly spaced pattern (-ve)

C. Random pattern

D. Atriplex hymeneltrya (desert holly)

One way of analyzing these patterns is to ask “what is the chance that a particular pattern could occur at random?” In each of the three sample patterns shown we can look at the spatial equivalent of one time step or lag, the patterns observed at one cell step, i.e. adjacent cells. If steps are restricted to rook’s moves on a chess board (i.e. up, down, left, right) we can count the number of instances of 1‑1, 0‑0, 1‑0 and 0‑1 occurring, and compare these to the number we might expect if the pattern was random. These numbers are referred to as join counts (sometimes erroneously described as ‘joint counts’). For smaller regions edge effects will be significant, so calculations need to be adjusted to reflect the fact that along the borders not all of the four directions are possible. For example, only two adjacent cells exist for the four corner positions, and only three adjacencies for other border cells.

Row and column totals for the adjacencies, or joins, are shown in the next diagram with the overall total being 120/2=60 joins. For our patterns, with 50% occupancy, we might expect 15 of these to be 1‑1 joins, 15 to be 0‑0 joins and the remaining 30 to be 0‑1 or 1‑0 joins. We can count up the number of each type of join and compare this to our expected values to judge how special (significant) or not our patterns are. In case "A" above there are 27 1‑1 joins, 27 0‑0 joins and only 6 0‑1 or 1‑0 joins - this seems very unlikely, and is indeed most unlikely. Similar calculations can be undertaken for cases "B" and "C", and as expected for case "B" all 60 joins are of type 1‑0 or 0‑1, which is again extremely unlikely to occur by chance. Case "C" has 35 1‑0/0‑1 joins compared with perhaps 30 expected, with 1‑1 joins being 13 and 0‑0 being 12, as against perhaps 15 in each case.

Join count computation

A test for the significance of the results we have observed can be produced using a Normal- or z-transform of the data. In practice three separate z-transforms are needed, one for the 1‑1 case, one for 0‑0, and one for 0‑1 and 1‑0. These transforms evaluate expressions of the form z=(O‑E)/SD where O is the observed number of joins of a given type, E is the expected number based on a random model, and SD is the expected standard deviation. The procedure and formulas can be implemented in software scripts for use within mainstream Geographic Information System (GIS) packages or data may be externally analyzed using specialized software and then the results mapped within a GIS; for example, using the Rookcase Excel add-in produced by Sawada (1999, obtainable from the University of Ottawa, Laboratory for Paleoclimatology and Climatology [SAW1]). Results for the four examples given in A-D above generated using the Rookcase add-in are shown in the table below (“B” and “W” here refer to Black and White, rather than 1 or 0; “Rand.” refers to the randomization model, which is discussed further below; and # means the number (of joins).

Join count analysis results

A. Positive spatial autocorrelation

C. Random model — no discernible spatial autocorrelation

B. Negative spatial autocorrelation

D. Atriplex hymeneltrya — positive BB spatial autocorrelation

Two features of the results in this table should be noted: (i) the expected number of joins is not 30,15,15 as we suggested earlier, but 30.86,14.57,14.57 these being adjusted values to take account of what is known as non-free sampling (i.e. sampling without replacement); and (ii) the z-statistic in B shows a large positive value for BW joins, and large negative values for BB and WW joins (absolute values of >1.96 would be significant at the 5% probability level). This mixed pattern can be confusing, especially if one or two of the z-scores shows a significant value whilst others do not, as in D.

The join counts method has been utilized in a variety of application areas including: ecological data analysis; to analyze patterns of voting (for example voting for one of two political parties or individuals); for analysis of land-use (e.g. developed or undeveloped land); and for examination of the distribution of rural settlements. However, the complexity of computing the theoretical means and variances in more realistic spatial models, the difficulty of interpreting the multiple z-scores, coupled with the availability of alternative approaches to analysis, have meant that join count procedures are not widely used and are rarely implemented in mainstream GIS packages. Support for this kind of analysis is provided in the joincount() functions of the R‑Project spatial package, spdep, and within the PASSaGE software. Metrics based on these concepts are provided in Fragstats and similar packages, for example in the computation of connection and contagion indices.

Homogeneous and non-homogenous probability images

The standard form of the join counts statistical analysis makes a number of assumptions regarding the observation data. These include (under free sampling) that the distribution of BB and WW joins is asymptotically Normal (Gaussian) and that the data is first-order homogeneous. To clarify this latter point, we use an example from Kabos and Csillag (2002 [KAB1]) who have investigated relaxing this assumption. They give the example of two simulated images with same B/W distribution (50%B and 50%W), but with different first-order effects. Their first pattern was simulated with first-order homogeneity. i.e., pb, the probability of any cell being black was 0.5 for all cells (see image below, left-hand image). Their second pattern was simulated by setting pb=0.6 for half of the image and pb=0.4 for the other half, so the overall probability pb remained at 0.5 but was no longer homogeneously distributed across the image (right-hand image). They found that the join count statistics (JCS) using pb=0.5 accepted the null hypothesis of spatial randomness for the first image, but rejected it for the second. Using unmodified JCS, they have thus demonstrated that we would erroneously (but understandably) conclude that there is significant spatial autocorrelation in the second image.

Join count mean and variance formulas        

Mean

Variance

Sampling hypothesis

Same color case (e.g. black-black or white-white)

F: Free sampling (re-sampling or ‘without replacement’)

R: Non-free sampling (randomization or ‘with replacement’)

Different color case (e.g. black-white)

f: Free sampling (re-sampling or ‘without replacement’)

r: Non-free sampling (randomization or ‘with replacement’)

where: pb, pw = probability of the category of interest occurring in a cell or lattice zone (e.g. probability of black or white zones in the case of binary datasets); note that where probabilities are estimated from the lattice then pb=nb/n and pw=nw/n but probabilities might be estimated from a related variable (e.g. proportion of votes cast) or from a broader sample (regional or national data); nb = number (count) of the category of interest occurring in a cell or lattice zone (e.g. number of black zones in the case of binary datasets); W={wij} is a spatial weights matrix, the basic version being a binary weights matrix in which wii=0 and wij=1 if zones i and j are adjacent, else wij=0. The formulas below simplify considerably with binary weights; and

Additional questions might reasonably be asked about this analytical procedure that point the way to more powerful extensions and developments of these ideas. Key questions, which we discuss below include:

how sensitive is this technique (and, of course, many other techniques) to the particular size of grid cell used and the number of cells?
if there are k classes rather than just two, how might this kind of data be analyzed?
why choose rook’s moves — why not permit queen’s moves, which would extend the idea of contiguity but increase the complexity of computation?
why restrict the notion of contiguity to directly adjacent cells (a lag of 1) — why not examine longer range effects, i.e. higher order spatial lags? Indeed, with separate analyses for different lags or distance bands one could produce a form of correlogram of autocorrelation effects
why should every cell have the same level of importance or weight — why not permit cells to have differing weights, depending on the kind of model of spatial association we are considering?
should the analysis be restricted to regular grids, or could it be extended to irregular grids, polygonal regions and even pure point data?
is a single statistic for a whole area meaningful, or should separate statistics be computed for sub-regions or individual cells and then plotted, giving a Local Indicator of Spatial Autocorrelation (LISA) measure, which would highlight clustering effects?
if the data in the study cells are integer or real-valued, why restrict analysis (lose valuable information) by ignoring the levels in regions?
how should we determine the cell probabilities, e.g. pB? Is it sufficient to always derive the probability estimate from the observed proportions of cells of type B, or are there other approaches that might be more appropriate?

We briefly comment on the first two of these questions in the discussion below. The remaining questions in the list, notably those relating to cases where the data values in the grid, lattice or point-set are integer or real-valued, are discussed later in this topic.

Grouping and size effects

A. 2x2 grouping of Atriplex hymeneltrya grid

B. 128x128 grid of Calluna vulgaris presence

The first point raised above can be examined directly by reviewing our example "D" above. If the grid had been sampled at half the frequency in each direction there would be 8x8=64 cells rather than 16x16=256 cells, and in this instance only one new larger cell contains no data, i.e. a 0, all others showing a desert holly plant present (grouping illustrated in "A", above left, where the empty 2x2 cell is shown in white). So we immediately see that such techniques may be highly sensitive to grid resolution, or, with irregular lattices, to the particular density, shape and connectivity of individual areas.

Furthermore, with large grids, e.g. 128x128 cells, the z-scores may become huge, raising doubts over the sensitivity and interpretation of the technique as the resolution of sampling or the size of the area covered is increased; in "B" above-left, there are 16384 cells and all three z-scores are over 160. A critical question here is whether the data in such cases reflect the result of direct observation, or whether they represent results from re-sampling or similar computational procedures. In the latter instance test scores may not be valid. At the other extreme scores may be suspect where the number of cells or regions is small (less than 30) or the category selected only occurs in a small percentage of areas.

The second question deals with the k-class case. There are several ways of dealing with this situation. Perhaps the first question to ask is whether the classes are genuinely nominal-valued (e.g. distinct species of tree); or whether the classes have been created from a set of underlying data that are themselves, integer or real-valued. In the latter instance it is sensible to examine the underlying data separately using more powerful techniques, such as Moran’s I statistic. It is also possible to tackle each of the k classes in turn, treating these as B=Black say, and regarding all other classes as W=White, and proceeding as described earlier. Alternatively one can examine all the k classes simultaneously by identifying all instances of the same class being found in adjacent areas. This pattern of occurrences can then be compared with the expected pattern assuming the zones are labeled in a spatially random manner, and/or using a pseudo-probability distribution generated using random permutations of the known values.

Support for this form of ‘multi-colored map’ analysis is to be found in the R-Project package, spdep, joincount functions and the join counts option of the PASSaGE software. Note that for some of the analyses performed it is assumed that the spatial weights matrix, W, is symmetric, or can be adjusted to be symmetric; this excludes the use of some, more complex, neighborhood relations.

Moran I

Many of the remaining questions from the preceding subsection relating to join count procedures can be addressed by considering a more general case than presence/absence data, with a region that is not a regular lattice.

Lattice data: we start with a simple incomplete rectangular lattice, shown in the next diagram as an arrangement of 10 cells with real-valued entries. Note that although this is shown with rectangular edges it is topologically equivalent to any set of planar regions that exhibit the same adjacency pattern, however irregular in form.

Irregular lattice dataset

 

+4.55

+5.54

+2.24

-5.15

+9.02

+3.10

-4.39

-2.09

 

+0.46

-3.06

Instead of displaying this data as an irregular lattice, it can be recast as a table of (x,y,z) triples, where rows are numbered 1,2… from the top and columns are likewise numbered 1,2… from the left, as shown in the table below. This coordinate based description of the data immediately suggests a generalization that may be applied to point datasets, with the (x,y) pairs providing the coordinate information. These (x,y) pairs could be well-defined points, or grid cells, or even polygon centroids.

Tabulated lattice data

x

y

z

1

2

4.55

1

3

5.54

2

1

2.24

2

2

‑5.15

2

3

9.02

3

1

3.1

3

2

‑4.39

3

3

‑2.09

4

2

0.46

4

3

‑3.06

This tabular form loses information, in that cell adjacencies are not explicitly captured. The next diagram provides this missing adjacency information for the sample lattice as a binary matrix using rook’s move definitions. In this example cells in the original dataset are numbered from 1 to 10, starting with the first cell being in row 2 column 1 of the source data and continuing down the rows and across the columns.                

Adjacency matrix, W, for irregular lattice dataset

This matrix can be thought of as a specific example of a spatial weights matrix, W={wij}, indicating elements of computations that are to be included or excluded. In this example wij=1 if two cells are adjacent (joined) and wij=0 otherwise. This matrix is symmetric and binary, but these features are not pre-requisites. For example, one might choose to create a matrix W in which rook’s move adjacencies were given a weight of 0.9 and bishop’s moves (the diagonal connections) a weight of 0.1. More generally, if the source dataset was polygonal, e.g. well-defined agricultural tracts, then adjacency weights might be chosen to reflect the relative lengths of shared boundaries.

We now have a set of values {zi} and a set of weights {wij}. We are looking for some way of combining this information using a function, f(), which satisfies the following criteria:

If the pair (zi,zj) are both +ve or both -ve f(zi,zj)>0
If the pair (zi,zj) are a mix of +ve and -ve f(zi,zj)<0
If the pair (zi,zj) are both large values f(zi,zj) is large
Patterns of adjacency reflected in the matrix W must be accounted for in the computation

One of the simplest ways to fulfill these criteria is to multiply the zone values together, optionally adjusting for the overall mean value for all zones and including the adjacency information, which gives:

or with mean adjustment and weights included:

As noted above, assuming the weights wij are binary, they simply identify which elements of the computation are to be included or excluded in the calculation. This expression looks quite like the covariance of the selected data. To adjust for the number of items included in this sum, and produce a covariance value, we need to divide through by the sum of the weights used, i.e.

To standardize this covariance expression we divide it by the variance of the data, which is simply:

The ratio of these two expressions gives us an index, known as Moran’s I, and has values that typically range from approximately +1, representing complete positive spatial autocorrelation, to approximately ‑1, representing complete negative spatial autocorrelation:

The upper and lower limits of this index may vary depending on the configuration of the weights used. It is the spatial equivalent of the correlation coefficient, r, and very similar to the (unweighted) time series autocorrelation coefficient at lag 1, r1 derived in the previous subsection:

If the observations, zi, are spatially independent Normal random variables, the expected value of Moran’s I is:

Hence for large n the expected value is approximately 0. The variance of Moran’s I under these conditions is shown below (compare this to the formulas for join counts provided earlier):

The variance of I under the assumption of randomization (i.e. randomly re-distributed over the support, without replacement) is yet more complicated, but is provided in a number of software packages. From the expected values and variances z‑scores and significance levels for Moran’s I can then be computed under the alternative sampling assumptions.

The computation of Moran’s I for the sample dataset provided in the table below illustrates the procedure. First we compute the matrix of unadjusted variance- and covariance-like quantities (see computation table below, upper section A). There are 10 cells, and values for each cell are shown together with their deviations from the overall mean value, 1.022, in the column and row bounding the 10x10 matrix of computed values, C. The diagonal of the 10x10 matrix gives the variance-like quantities whilst the off-diagonal components give the covariance-like quantities. The weight-adjusted matrix (shown in the lower part of this table, B) includes only the off-diagonal elements, and only those for which adjacencies have been included. The total of these values is 16.19, compared with the sum of the diagonal values which is 196.68. The computation of Moran’s I requires one final element, adjustment by 1/p, which in this case is the number of cells (10 in our case) divided by the sum of the elements in W, which is 26, giving: I=10*16.19/(26*196.68) so I=0.0317. If this value is close to 0 there is very little spatial autocorrelation, which is what we have found in this example.

Moran's I computation

A. Computation of variance/covariance-like quantities, matrix C

B. C*W: Adjustment by multiplication of the weighting matrix, W

If we rearrange the cell values the index value will change accordingly. For example, if the negative values in our example are swapped with the positive values in column 3, so that column 2 is entirely positive and column 3 entirely negative (see revised source data table, below) the index value computed is I=0.26.

Revised source data

 

+4.55

-5.15

+2.24

+5.54

-4.39

+3.10

+9.02

-2.09

 

+0.46

-3.06

Moran’s I is one of the most frequently implemented statistics within GIS packages, including ArcGIS Spatial Statistics Toolbox, the AUTOCORR function in Idrisi, and in most other spatial analytical tools. This is in part due to its ease of computation across a range of datasets and weighting options, and because of its relationship to well-understood measures used in univariate statistics and time series analysis. It is also readily extended in a number of ways that increase its application and usefulness. These extensions include: adjustment for differences in underlying population size; treatment of spatial point data utilizing distance bands rather like time lags; and computation of local Moran I values rather than a single whole map or global Moran I value. We examine each of these in turn.

Oden (1995 [ODE1]) proposed an adjustment to Moran’s I for zonal data where population size data (e.g. population at risk) is available. This statistic is supported within Clusterseer, but is not generally provided in other packages. Under this adjusted version the null hypothesis is that measured rates (e.g. disease rates, crime rates) in adjacent areas are independent, with the spatial variation observed simply reflecting the spatial variation in the underlying population. The computation of this statistic is quite involved, but essentially utilizes the difference between the proportion of cases or incidents in a given zone (i.e. as a proportion of the total cases in the study region) and the expected proportion in the population at large (again, computed as the population in zone i/total population at risk). The expected value for the adjusted statistic is similar to that for the standard statistic, being E(Ipop)=‑1/(N‑1) where N is now the total population at risk. This statistic is not independent of population size, but can be readily standardized by dividing the computed value by the overall incidence rate for the study region.

Point data: We now turn to treatment of spatial point data rather than zoned or lattice data. Suppose that instead of zones we had a set of points (x,y) with associated data values, z (which could be values assigned to zone centroids). We could then compute:

where N(h) is the number of points in a distance band of width d and mean distance of h; zi is the (standardized) value at point i; and zj is the (standardized) value at a point j distance h from i (in practice within a distance band from h‑0.5d to h+0.5d). The summations apply for all point pairs in the selected band. Hence the adjacency matrix W is replaced by distance, h, and by using a number of bands a set of I(h) values can be computed and plotted against h. This form of graph or diagram showing correlation information, as noted earlier, is known as a correlogram. To be meaningful the width of distance bands, h, should be greater than the minimum inter-point distance in the dataset, and the initial band should commence at 0.5d in order to give a better representation to values near the origin.

For example, the GS+ geostatistics software demonstration dataset comprises 132 separate spatial measurements of lead and aluminum levels in soil samples. An extract of the first 10 records is shown in the diagram below (section A) with missing values being blank entries (coded on input in this case as ‑99). Note that missing values are not the same as zero values, and recorded zero values may actually be a reflection of the limitations of the measuring device rather than a true zero value. In such instances it may appropriate to adjust zero values to the minimum recordable value. Section B of this diagram shows a map of the point set for the selected z-variable (Pb, lead), with separate symbols indicating the four quartiles of the overall dataset that each point lies within (the triangular shape is the upper quartile, the square is the lower quartile). There are well over 10,000 possible point pairings in this sample dataset, but when grouped into distance bands only a subset of pairings is considered (section C of the diagram). For example, in the first band, lag class 1, there are 226 point pairs, the average distance between pairs is 5.77 units, the bandwidth, h, is fixed for all lags at 8.21 units (defaulted to the maximum pairwise distance/10), and the Moran I value is 0.4179. This value decreases for the first few lags and then becomes closer to 0 and somewhat random. These values may be plotted, ignoring any possible directional bias (hence the use of the term isotropic) as shown in D. Spatial autocorrelation (positive) as measured by this function, appears to be present for separation distances up to around 20‑25 meters.

Sample dataset and Moran I analysis

A. Dataset, first 10 records

B. Coordinates of data points

C. Moran I statistics, by lag

D. Isotropic Correlogram

Each square on the graph in part D of this diagram represents an index value generated from a large number of point pairs, and the individual contributions to the covariance can be plotted as a cloud of (in this case) 226 values (lag1, as shown below):

Moran I (co)variance cloud, lag 1

moran_lag1

For example, the largest value in this cloud has a value of 0.4276 and is derived from points 100 and 115, whose separation distance is 6.44 units, and whose data values are 0.17 and 1.25 respectively. Now (0.17‑1.25)2 is not 0.4276, so we must look rather more carefully at what this software package (GS+) is doing with the data. The input dataset, like many such samples, is strongly left skewed - there are many small values recorded and only a few larger ones, with 1.25 being the maximum z-value. By default GS+ automatically transforms the data to make it more Normal, using the log transform z*=ln(z+1). This gives the values:

z*100=0.157 and z*115=0.811 and thus (0.157‑0.811)2=0.4276

The correlogram shown is isotropic, i.e. it has been calculated for all points, irrespective of direction. Spatial patterns may well exhibit directional bias and hence it may be of interest to divide the circular lag regions into sectors and investigate the correlograms for each sector. Due to symmetry only half the possible bearings need to be considered. Taking a sector window of, say, 45 degrees (i.e. θ +/‑ 22.5 degrees) results in a set of 4 possible sectors to be analyzed. The choice of initial bearing, θ, can be selected by reference to the problem at hand, or by examining the dataset to locate the orientation of the principal axis of the standard deviational ellipse. Software packages often identify this for the user and then generate a set of anisotropic correlograms to enable the data to be examined sector-by-sector. If all are broadly similar it is likely that directional effects are small. Again, looking at this sample dataset, the principal axis is at 130 degrees, and the four anisotropic correlograms do not differ greatly.

Contiguity ratio or Geary’s C

One alternative to the Moran I form of computation involves returning to the original set of criteria for a suitable function. Instead of computing the product sum:

We could equally well have computed the sum of squared differences:

Or the absolute differences:

since these have similar desirable properties. These expressions once again can be standardized by dividing by the variance and adjusted for the ratio of zones or point pairs sampled. If the expression S2 above is used the resulting measure is known as the Contiguity ratio or Geary’s C ratio:

This expression may be compared to the formula for Moran I provided earlier. Unfortunately this ratio is not particularly easy to interpret: its values vary around 1, which indicates no spatial autocorrelation; to values as low as 0, which indicate positive spatial autocorrelation; and values greater than 1 which indicate negative spatial autocorrelation. Perhaps for this reason it is rarely implemented in this form, but expression S2 turns out to be extremely useful in what is known as semivariance analysis in the field of geostatistics.

Due to the inconvenience of working with absolute values, expression S3 is rarely used, but it is applied in semivariance analysis and is less sensitive to extreme values than computations based on S2. In this case the correlograms are known as Madograms - for more details see Deutsch and Journel (1992, [DEU1]) and Goovaerts (1997,[GOO1]).

Weighting models and lags

Our discussion on correlograms highlights the way in which distance bands can be used to generate data series in a manner similar to the lags used in time series analysis. Weighting matrices that incorporate values other than 0 and 1 were introduced by Cliff and Ord (1973 [CLI1]). One way in which distance-based weighting is applied is to use the actual distances, rather than distance bands, as the means by which data pairs are “ordered”. The usual approach (for example that adopted within the Crimestat software package) is to use inverse distance, so that pairs that are further apart contribute less to the overall index value than those that are close together. The distance weighted Moran’s I is then given by:

where

The ε here represents an increment applied to the distance to ensure that when dij is small, results are not overly distorted or result in numerical overflow. Other software packages implement a range of such distance weights, for example: inverse distance; inverse distance squared; and user-specified distance bands.

Local indicators of spatial association (LISA)

If we look more closely at the Moran I index we see that it can be disaggregated to provide a series of local indices. We start with the standard formulation we provided earlier:

Examining the computation of this “global” index provided earlier we can see that each row in the computation contributes to the overall sum (see Local Moran computation table, below), with the covariance component being 16.19 and overall Moran I value 0.03167.

Local Moran I computation

A. C*W: Adjustment by multiplication of the weighting matrix, W

B. C*W1: Adjustment by multiplication of the row-adjusted weighting matrix, W1

Each row sum in this table, S, could be standardized by the overall sum of squared deviations (R/SSD) and adjusted by the number of cells, n, as shown. The total of these local contributions divided by the total number of joins, ΣΣwij, gives the overall or Global Moran I value. These individual components, or Local Indicators of Spatial Association (LISA), can be mapped and tested for significance to provide an indication of clustering patterns within the study region (as illustrated below).

LISA map, Moran I

The calculations require some adjustment in order to provide Global Moran I and LISA values in their commonly used form. The first adjustment is to the weights matrix. It is usual to standardize the row totals in the weights matrix to sum to 1. One advantage of row standardization is that for each row (location) the set of weights corresponds to a form of weighted average of the neighbors that are included, which also enables different models to be readily compared. Row standardization may also provide a computationally more suitable weights matrix. This procedure alters the Global Moran I value when rows have differing totals, which is the typical situation. For the example above the effect is to increase the computed Moran I to 0.0736. The second adjustment involves using (n‑1) rather than n as the row multiplier, and then the sum is re-adjusted by this factor to produce the Global index value. These changes are illustrated in the computational table "B" above and correspond to the computations performed within GeoDa, R and SpaceStat (Anselin, personal communication). ArcGIS offers the option to have no weights matrix standardization, row standardization (dividing rows by row totals) or total weights standardization (dividing all cells by the overall cell total). The Geary C contiguity index may also be disaggregated to produce a LISA measure, in the same manner as for the Moran I.

Getis and Ord G statistic

Another measure, known as the "G" statistic, due to Getis and Ord (1992, 1995 [GET1],[GET2]), may also be used in this way. The latter is supported within ArcGIS as both a global index and in disaggregated form for identifying local clustering patterns, which ArcGIS considers this as a form of hot-spot analysis. The ArcGIS implementation of the G statistic facilitates the use of simple fixed Euclidean distance bands (threshold distances) within binary weights (as described below), but also permits the use of the Manhattan metric and a variety of alternative models of contiguity such as: inverse distance; inverse distance squared; so-called Zones of Indifference (a combination of inverse distance and threshold distance); and user-defined spatial weights matrices. Unlike the Moran I and Geary C statistics, the Getis and Ord G statistic identifies the degree to which high or low values cluster together.

The standard global form of the Getis and Ord G statistic is:

whilst the local variant is

The weights matrix, W, is typically defined as a symmetric binary contiguity matrix, with contiguity determined by a (generalized) distance threshold, d. A variant of this latter statistic, which is regarded as of greater use in hot-spot analysis, permits i=j in the expression (so includes i in the summations, with wii=1 rather than 0 in the binary weights contiguity model) and is known as the Gi* statistic rather than Gi. Note that these local statistics are not meaningful if there are no events or values in the set of cells or zones under examination.

The expected value of G under the hypothesis of Complete Spatial Randomness (CSR) is:

where n is the number of points or zones in the study area and W is the sum of the weights, which for binary weights is simply the number of pairs within the distance threshold, d. The equivalent expected value for the local variant is:

where n is the number of points or zones within the threshold distance, d, for point/region i. Closed formulas for the variances have been obtained but are very lengthy. They facilitate the construction of z-scores and hence significance testing of the global and local versions of the G statistics. For the local measure both computed values and z-scores may be useful to map.

The Rookcase Excel add-in ([Sawada, [SAW1]) provides support for all three LISA measures with variable lag distance and number of lags.

Significance tests for spatial autocorrelation statistics

As noted in the preceding sections, the various global and local spatial autocorrelation coefficients discussed can be tested for statistical significance under two, rather different, model assumptions. The first is the classical statistical assumption of Normality, whereby it is assumed that the observed value of the coefficient is the result of the set {zi} of values being independent and identically distributed drawings from a Normal distribution, implying that variances are constant across the region. The second model is one of randomization, whereby the observed pattern of the set {zi} of values is assumed to be just one realization from all possible random permutations of the observed values across all the zones.

Both models have important weaknesses, for example as a result of underlying population size variation and lack of homogeneity of probabilities, but are widely implemented in software packages to provide estimates of the significance of observed results. In the case of the randomization model many software packages generate a set of N random permutations of the input data, where N is user specified. For each simulation run index values are computed and the set of such values used to provide a pseudo-probability distribution for the given problem, against which the observed value can be compared. A z-transform of the coefficients under Normality or Randomization assumptions is distributed approximately ~N(0,1), hence this may be compared to percentage points of the Normal distribution to identify particularly high or low values.

The Rookcase Excel add-in computes both values, for the global Moran I and Geary C, and the results in the table below illustrate this. Note that these results are calculated using overall weight adjustment rather than row-wise adjustment. The results suggest the observed index Moran value of 0.26 could reasonably be expected to have occurred by chance. With the local variant of Moran’s I the same procedures as for the Global statistic can be adopted.

Significance tests for revised sample dataset

Crimestat provides the option to compute z-scores for this statistic but notes that it may be very slow to calculate, depending on the number of zones. GeoDa uses random permutations to obtain an estimated mean and standard deviation, which may then be used to create an estimated z-score.

The local variants of these statistics utilize values in individual cells or zones over and over again. This process means that the values obtained for adjacent neighbors will not be independent, which breaches the requirements for significance testing, a problem exacerbated by the fact that such cells are also likely to exhibit spatial autocorrelation. There are no generally-agreed solutions to these problems, but one approach is to regard each test as one of a multiple set. In classical statistics multiple tests of independent sets (the members of which are not correlated) are made using a reduced or corrected significance level, α. The most basic of these adjustments (known as a Bonferroni correction, after its original protagonist) is to use a significance level defined by α/n, where n is the number of tests carried out, or in the present case, the number of zones or features being tested. If n is very large the use of such corrections results in extremely conservative confidence levels that may be inappropriate. Monte Carlo randomization techniques that seek to estimate the sampling distribution may be more appropriate in such cases.

References

[CLI1] Cliff A, Ord J (1973) Spatial autocorrelation. Pion, London

[DAC1] Dacey M (1968) A review of measures of contiguity for two and k-color maps. pp 479-95 in Berry B J L, Marble D F eds. Spatial analysis. Prentice Hall, New Jersey

[DEU1] Deutsch C V, Journel A G (1992, 1998) GSLIB geostatistical software library and user’s guide. Oxford University Press, New York

[GET1] Getis A D, Ord J K (1992) The analysis of spatial association using distance statistics. Geog. Anal., 24, 189-206

[GET2] Getis A D, Ord J K (1995) Local spatial autocorrelation statistics: Distributional issues and an application. Geog. Anal., 27, 287-306

[GOO1] Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York

[KAB1] Kabos S, Csillag F (2002) The analysis of spatial association on a regular lattice by join-count statistics without the assumption of first-order homogeneity. Computers & Geosciences, 28, 901-910

[ODE1] Oden N (1995) Adjusting Moran's I for population density. Statistics in Medicine 14, 17-26

[SAW1] Sawada M (1999) ROOKCASE: An Excel 97/2000 Visual Basic (VB) Add-in for exploring global and local spatial autocorrelation. Bull. Ecological Soc. America, 80, 231-4