With conventional datasets the mean value is often the first, and one of the most useful, summary statistics to calculate. When data is in the form of a time series, the series mean is a useful measure, but does not reflect the dynamic nature of the data. Mean values computed over shorted periods, either preceding the current period or centered on the current period, are often more useful. Because such mean values will vary, or move, as the current period moves from time t=2, t=3, ... etc. they are known as moving averages (Mas). A simple moving average is (typically) the unweighted average of k prior values. An exponentially weighted moving average is essentially the same as a simple moving average, but with contributions to the mean weighted by their proximity to the current time. Because there is not one, but a whole series of moving averages for any given series, the set of Mas can themselves be plotted on graphs, analyzed as a series, and used in modeling and forecasting. A range of models can be constructed using moving averages, and these are known as MA models. If such models are combined with autoregressive (AR) models the resulting composite models are known as ARMA or ARIMA models (the I is for integrated).

Since a time series can be regarded as a set of values, {xt}, t=1,2,3,4,…n the average of these values can be computed. If we assume that n is quite large, and we select an integer k which is much smaller than n, we can compute a set of block averages, or simple moving averages (of order k):

Each measure represents the average of the data values over an interval of k observations. Note that the first possible MA of order k>0 is that for t=k. More generally we can drop the extra subscript in the expressions above and write:

This states that the estimated mean at time t is the simple average of the observed value at time t and the preceding k-1 time steps. If weights are applied that diminish the contribution of observations that are further away in time, the moving average is said to be exponentially smoothed. Moving averages are often used as a form of forecasting, whereby the estimated value for a series at time t+1, St+1, is taken as the MA for the period up to and including time t, e.g. today's estimate is based on an average of prior recorded values up to and including yesterday's (for daily data).

Simple moving averages can be seen as a form of smoothing. In the example illustrated below, the air pollution dataset shown in the introduction to this topic has been augmented by a 7-day moving average (MA) line, shown here in red. As can be seen, the MA line smooths out the peaks and troughs in the data and can be very helpful in identifying trends. The standard forward-calculation formula means that the first k-1 data points have no MA value, but thereafter computations extend to the final data point in the series.

PM10 daily mean values, Greenwich

source: London Air Quality Network, www.londonair.org.uk

One reason for computing simple moving averages in the manner described is that it enables values to be computed for all time slots from time t=k up to the present, and as a new measurement is obtained for time t+1, the MA for time t+1 can be added to the set already calculated. This provides a simple procedure for dynamic datasets. However, there are some issues with this approach. It is reasonable to argue that the mean value over the last 3 periods, say, should be located at time t-1, not time t, and for a MA over an even number of periods perhaps it should be located at the mid-point between two time intervals. A solution to this issue is to use centered MA calculations, in which the MA at time t is the mean of a symmetric set of values around t. Despite its obvious merits, this approach is not generally used because it requires that data is available for future events, which may not be the case. In instances where analysis is entirely of an existing series, the use of centered Mas may be preferable.

Simple moving averages can be considered as a form of smoothing, removing some high frequency components of a time series and highlighting (but not removing) trends in a similar manner to the general notion of digital filtering. Indeed, moving averages are a form of linear filter. It is possible to apply a moving average computation to a series that has already been smoothed, i.e. smoothing or filtering an already smoothed series. For example, with a moving average of order 2, we can regard it as being computed using weights {1/2,1/2}, so the MA at x2=0.5x1+0.5x2. Likewise, the MA at x3=0.5x2+0.5x3 . If we apply a second level of smoothing or filtering, we have 0.5x2+0.5x3=0.5(0.5x1+0.5x2)+0.5(0.5x2+0.5x3)= 0.25x1+0.5x2+0.25x3 i.e. the 2-stage filtering process (or convolution) has produced a variably weighted symmetric moving average, with weights {1/4,1/2,1/4}. Multiple convolutions can produce quite complex weighted moving averages, some of which have been found of particular use in specialized fields, such as in life insurance calculations.

Moving averages can be used to remove periodic effects if computed with the length of the periodicity as a known. For example, with monthly data seasonal variations can often be removed (if this is the objective) by apply a symmetric 12-month moving average with all months weighted equally, except the first and last which are weighted by 1/2. This is because there will be 13 months in the symmetric model (current time, t, +/- 6 months). The total is divided by 12. Similar procedures can be adopted for any well-defined periodicity.

Exponentially weighted moving averages (EWMA)

With the simple moving average formula:

all observations are equally weighted. If we called these equal weights, αt, each of the k weights would equal 1/k, so the sum of the weights would be 1, and the formula would be:

We have already seen that multiple applications of this process result in the weights varying. With exponentially weighted moving averages the contribution to the mean value from observations that are more removed in time is deliberated reduced, thereby emphasizing more recent (local) events. Essentially a smoothing parameter, 0<α<1, is introduced, and the formula revised to:

A symmetric version of this formula would be of the form:

If the weights in the symmetric model are selected as the terms of the terms of the binomial expansion, (1/2+1/2)2q, they will sum to 1, and as q becomes large, will approximate the Normal distribution. This is a form of kernel weighting, with the Binomial acting as the kernel function. The two stage convolution described in the previous subsection is precisely this arrangement, with q=1, yielding the weights {1/4,1/2,1/4}.

In exponential smoothing it is necessary to use a set of weights that sum to 1 and which reduce in size geometrically. The weights used are typically of the form:

To show that these weights sum to 1, consider the expansion of 1/α as a series. We can write

and expand the expression in brackets using the binomial formula (1-x)p, where x=(1-α) and p=-1, which gives:

thus

This then provides a form of weighted moving average of the form:

or

This summation can be written as a recurrence relation:

which simplifies computation greatly, and avoids the problem that the weighting regime should strictly be infinite for the weights to sum to 1 (for small values of α, this is typically not the case). The notation used by different authors varies. Some use the letter S to indicate that the formula is essentially a smoothed variable, and write:

whereas the control theory literature often uses Z rather than S for the exponentially weighted or smoothed values (see, for example, Lucas and Saccucci, 1990, [LUC1], and the NIST website for more details and worked examples). The formulas cited above derive from the work of Roberts (1959, [ROB1]), but Hunter (1986, [HUN1]) uses an expression of the form:

which may be more appropriate for use in some control procedures. With α=1 the mean estimate is simply its measured value (or the value of the previous data item). With α=0.5 the estimate is the simple moving average of the current and previous measurements. In forecasting models the value, St, is often used as the estimate or forecast value for the next time period, i.e. as the estimate for x at time t+1. Thus we have:

hence

This shows that the forecast value at time t+1 is a combination of the previous exponentially weighted moving average plus a component that represents the weighted prediction error, ε, at time t.

Assuming a time series is given and a forecast is required, a value for α is required. This can be estimated from the existing data by evaluating the sum of squared prediction errors obtain with varying values of α for each t=2,3,..., setting the first estimate to be the first observed data value, x1. In control applications the value of α is important in that is is used in the determination of the upper and lower control limits, and affects the average run length (ARL) expected before these control limits are broken (under the assumption that the time series represents a set of random, identically distributed independent variables with common variance). Under these circumstances the variance of the control statistic:

is (Lucas and Saccucci, 1990):

Control limits are usually set as fixed multiples of this asymptotic variance, e.g. +/- 3 times the standard deviation. If α=0.25, for example, and the data being monitored is assumed to have a Normal distribution, N(0,1), when 'in control', the control limits will be +/- 1.134 and the process will reach one or other limit in 500 steps on average. Lucas and Saccucci (1990 [LUC1]) derive the ARLs for a wide range of α values and under various assumptions using Markov Chain procedures. They tabulate the results, including providing ARLs when the mean of the control process has been shifted by some multiple of the standard deviation. For example, with a 0.5 shift with α=0.25 the ARL is less than 50 time steps.

The approaches described above is known as single exponential smoothing, as the procedures are applied once to the time series and then analyses or control processes are carried out on the resulting smoothed dataset. If the dataset includes a trend and/or seasonal components, two- or three-stage exponential smoothing can be applied as a means of removing (explicitly modeling) these effects (see further, the section on Forecasting, below, and the NIST worked example).

References

[CHA1] Chatfield C (1975) The Analysis of Times Series: Theory and Practice. Chapman and Hall, London

[HUN1] Hunter J S (1986) The exponentially weighted moving average. J of Quality Technology, 18, 203-210

[LUC1] Lucas J M, Saccucci M S (1990) Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements. Technometrics, 32(1),1-12

[ROB1] Roberts S W (1959) Control Chart Tests Based on Geometric Moving Averages. Technometrics, 1, 239-250