ARMA and ARIMA (Box-Jenkins) models

<< Click to Display Table of Contents >>

Navigation:  Time series analysis and temporal autoregression >

ARMA and ARIMA (Box-Jenkins) models

In the preceding sections we have seen how the value of a univariate time series at time t, xt, can be modeled using a variety of moving average expressions. We have also shown that components such as trends and periodicity in the time series can be explicitly modeled and/or separated out, with the data being decomposed into trend, seasonal and residual components. We also showed, in the earlier discussions on autocorrelation, that the full and partial autocorrelation coefficients are extremely useful in identifying and modeling patterns in time series. These two aspects of time series analysis and modeling can be combined in a more general, and often very effective, overall modeling framework. In its basic form this approach is known as ARMA modeling (autoregressive moving average), or when differencing is included in the procedure, ARIMA or Box-Jenkins modeling, after the two authors who were central to its development (see Box & Jenkins, 1968 [BOX1], and Box, Jenkins & Reinsel, 1994 [BOX2]). There is no fixed rule as to the number of time periods required for a successful modeling exercise, but for more complex models, and for greater confidence in fit and validation procedures, series with 50+ time steps are often recommended.

ARMA models combine autocorrelation methods (AR) and moving averages (MA) into a composite model of the time series. Before considering how these models can be combined, we examine each separately.

MA models

We have already seen that moving average (MA) models can be used to provide a good fit to some datasets, and variations on these models that involve double or triple exponential smoothing can handle trend and periodic components in the data. Furthermore, such models can be used to create forecasts that mimic the behavior of earlier periods. A simple form of such models, based on prior data, can be written as:

Where the βi terms are the weights applied to prior values in the time series, and it is usual to define βi=1, without loss of generality. So for a first order process, q=1 and we have the model:

i.e. the moving average value is estimated as a weighted average of the current and immediate past values. This averaging process is, in some sense, a pragmatic smoothing mechanism without a direct link to a statistical model. However, we can specify a statistical (or stochastic) model that embraces the procedures of moving averages in conjunction with random processes. If we let {zt} be a set of independent and identically distributed random variables (a random process) with zero mean and known fixed variance, then we can write the process {xt} as a moving average of order q in terms of {zt}:

Clearly the expected value of xt under this model is 0, so the model is only valid if the xt have already been adjusted to have a zero mean or if a fixed constant (the mean of the xt) is added to the summation. It is also apparent that the variance of xt is simply:

The above analysis can be extended to evaluate the covariance, cov(xt,xt+k), which we find yields:

Note that neither the mean value, nor the covariance (or autocovariance) at lag k is a function of the time, t, so the process is second order stationary. The above expression enables us to obtain an expression for the autocorrelation function (acf):

If k=0 ρk=1, and for k>q ρk=0. Furthermore, the acf is symmetric and ρk=ρ-k. The acf can be computed for a first order MA process:

and is simply:

AR models

The autoregressive or AR component of an ARMA model can be written in the form:

where the terms in α are autocorrelation coefficients at lags 1,2...p and zt is a residual error term. Note that this error term specifically relates to the current time period, t. So for a first order process, p=1 and we have the model:


These expressions state that the estimated value of x at time=t is determined by the immediately previous value of x (i.e. at time=t-1) multiplied by a measure, α, of the extent to which the values for all pairs of values at time periods lag 1 apart are correlated (i.e. their autocorrelation), plus a residual error term, z, at time t. But this is precisely the definition of a Markov Process, so a Markov Process is a first order autoregressive process. If α=1 the model states that the next value of x is simply the previous value plus a random error term, and hence is a simple 1D random walk. If more terms are included the model estimates the value of x at time=t by a weighted sum of these terms plus a random error component.

If we substitute the second expression above into the first, we have:

and repeated application of this substitution yields:

Now if |α|<1 and k is large, this expression can be written in the reverse order, with diminishing terms and with contribution from the term in x on the right hand side of the expression becoming vanishingly small, so we have:

Since the right hand side of this expression models xt as the sum of a weighted set of prior values, in this case random error terms, it is clear that this AR model is, in fact, a form of MA model. And if we assume that the error terms have zero mean and constant variance, then as in the MA model we have the expected value of the model as also 0, assuming the xt have been adjusted to provide a zero mean, with variance:

Now as long as |α|<1 this summation is finite and is simply 1/(1-α), so we have:

As with the MA model above, this analysis can be extended to evaluate the covariance, cov(xt,xt+k) of a first order AR process, which we find yields:

For |α|<1 this summation is finite and is simply αk/(1-α2), so we have:

This demonstrates that for a first order autoregressive model the autocorrelation function (acf) is simply defined by successive powers of the first order autocorrelation, α, with the condition |α|<1. For α>0 this is simply a rapidly diminishing power or exponential-like curve, tending to zero, or for α<0 it is a dampening oscillatory curve, again tending to zero. If an assumption is made that the time series is stationary the above analysis can be extended to second and higher order autocorrelations.

In order to fit an AR model to an observed dataset, we seek to minimize the sum of squared errors (a least squares fit) using the smallest number of terms that provide a satisfactory fit to the data. Models of this type are described as autoregressive, and may be applied to both time series and spatial datasets (see further, spatial autoregression models). Although in theory an autoregressive model might provide a good fit to an observed dataset, it would generally require prior removal of any trend and periodic components, and even then might need a large number of terms in order to provide a good fit to the data. However, by combining the AR models with MA models, we can produce a family of mixed models that can be applied in a wide range of situations. These models are known as ARMA and ARIMA models, and are described in the following subsections.

ARMA Models

In the previous two subsections we introduced the MA mode of order q:

and the AR model of order p:

We can combine these two models by simply adding them together as a model of order (p,q), where we have p AR terms and q MA terms:

In general, this form of combined ARMA model can be used to model a time series with fewer terms overall than either an MA or an AR model by themselves. It expresses the estimated value at time t as the sum of q terms that represent the average variation of random variation over q previous periods (the MA component), plus the sum of p AR terms that compute the current value of x as the weighted sum of the p most recent values.

However, this form of model assumes that the time series is stationary, which is rarely the case. In practice, trends and periodicity exists in many datasets, so there is a need to remove these effects before applying such models. Removal is typically carried out by including in the model an initial differencing stage, typically once, twice or three times, until the series is at least approximately stationary — exhibiting no obvious trends or periodicities. As with the MA and AR processes, the differencing process is described by the order of differencing, for example 1, 2, 3.... Collectively these three elements make up a triple: (p,d,q) that defines the type of model applied. In this form, the model is described as an ARIMA model. The letter I in ARIMA refers to the fact that the dataset has been initially differenced (cf. differentiation) and when the modeling is complete the results then have to be summed or integrated to produce the final estimations and forecasts. ARIMA modeling is discussed below.

ARIMA Models

As noted in the previous subsection, combining differencing of a non-stationary time series with the ARMA model provides a powerful family of models that can be applied in a wide range of situations. Development of this extended form of model is largely due to G E P Box and G M Jenkins, and as a result ARIMA models are also known as Box-Jenkins models.

The first step in the Box-Jenkins procedure is to difference the time series until it is stationary, thereby ensuring that trend and seasonal components are removed. In many instances one or two stage differencing is sufficient. The differenced series will be shorter than the source series by c time steps, where c is the range of the differencing. An ARMA model is then fitted to the resulting time series. Because ARIMA models have three parameters there are many variations to the possible models that could be fitted. However, the decision on what these parameters should be can be guided by a number of basic principles: (i) the model should be as simple as possible, i.e. contain as few terms as possible, which in turn means the values of p and q should be small; (ii) the fit to historic data should be as good as possible, i.e. the size of the squared differences between the estimated value at any past time period and the actual value, should be minimized (least squares principle) — the residuals from the selected model can then be examined to see if any remaining residuals are significantly different from 0 (see further, below); (iii) the measured partial autocorrelation at lags 1,2,3.... should provide an indication of the order of the AR component, i.e. the value chosen for q; (iv) the shape of autocorrelation function (acf) plot can suggest the type of ARIMA model required — the table below (from the NIST) provides guidance on interpreting the form of the acf in terms of model selection.

ARIMA Model type selection using acf shape



Exponential, decaying to zero

Autoregressive model. Use the partial autocorrelation plot to identify the order of the autoregressive model

Alternating positive and negative, decaying to zero

Autoregressive model. Use the partial autocorrelation plot to help identify the order

One or more spikes, rest are essentially zero

Moving average model, order identified by where plot becomes zero

Decay, starting after a few lags

Mixed autoregressive and moving average model

All zero or close to zero

Data is essentially random

High values at fixed intervals

Include seasonal autoregressive term

No decay to zero

Series is not stationary

Standard ARIMA models are often described by the triple: (p,d,q) as noted above. These define the structure of the model in terms of the order of AR, differencing and MA models to be used. It is also possible to include similar parameters for seasonality in the data, although such models are more complex to fit and to interpret — the triple (P,D,Q) is generally used to identify such model components. In the screenshot from SPSS shown below, the dialog for manually selecting non-seasonal and seasonal structural elements is displayed (similar facilities are available in other integrated packages, such as SAS/ETS). As can be seen, the dialog also enables the data to be transformed (typically to assist with variance stabilization) and to enable users to include a constant in the model (the default). This particular software tool allows outliers to be detected if necessary, according to a range of detection procedures, but in many instances outliers will have been investigated and adjusted or removed and substitute values estimated, prior to any such analysis.

A number of ARIMA models can be fitted to the data, manually or via an automated process (e.g. a stepwise process), and one or more measures used to judge which is the 'best' in terms of fit and parsimony. Model comparison typically makes use of one or more of the information theoretic measures described earlier in this handbook — AIC, BIC and/or MDL (the R function, arima(), provides the AIC measure, whereas SPSS provides a range of fit measures, included a version of the BIC statistic; other tools vary in the measures provided — Minitab, which provides a range of TSA methods, does not include AIC/BIC type statistics). In practice a wide range of measures (i.e. other than/in addition to the least squares based measures, can be used to evaluate the model quality. For example, the mean absolute error and the maximum absolute error may be useful measures, since even a good least squares fit may still be poor in places.

SPSS Time Series Modeler: ARIMA modeling, expert mode


A number of software packages may also provide an overall measure of the autocorrelation that may remain in the residuals after fitting the model. A statistic frequently applied is due to Ljung and Box (1978 [LJU1]), and is of the form:

where n is the number of samples (data values), ri is the sample autocorrelation at lag i, and k is the total number of lags over which the computation is carried out. Qk is approximately distributed as a chi-square distribution with k-m degrees of freedom, where m is the number of parameters used in fitting the model, excluding any constant term or predictor variables (i.e. just including the p,d,q triples). If the measure is statistically significant it indicates that the residuals still contain significant autocorrelation after the model has been fitted, suggesting that an improved model should be sought.

Example: Modeling the growth of airline passenger numbers

The following is an example of automated fitting, using SPSS, to the Box-Jenkins-Reinsel test data of airline passenger numbers [REI1] provided earlier in this Handbook. Initially no specification of the dates being months within years was specified. The model selected by the automated process was an ARIMA model (0,1,12), i.e. the process correctly identified that the series required one level of differencing and applied a moving average model with a periodicity of 12 and no autocorrelation component to fit the data. The model fit produced an R2 value of 0.966, which is very high, and a maximum absolute error (MAE) of 75. The visual fit of the model to the data looks excellent, but the plot of the residual autocorrelation after fitting and Ljung-Box test shows that significant autocorrelation remains, indicating that an improved model is possible.

Automated ARIMA fit to International Airline Passengers: Monthly Totals, 1949-1960


To investigate this further a revised model was fitted, based on the discussion of this dataset by Box and Jenkins (1968) and the updated edition of Chatfield's (1975 [CHA1]) book in which he uses Minitab to illustrate his analysis (6th edition, 2003). The time series was defined as having a periodicity of 12 months and an ARIMA model with components (0,1,1),(0,1,1). Graphically the results look very similar to the chart above, but with this model the R-squared is 0.991, the MAE=41 and the Ljung-Box statistic is no longer significant (12.6, with 16 degrees of freedom). The model is thus an improvement on the original (automatically generated) version, being comprised of a non-seasonal MA and a seasonal MA component, no autoregressive component, and one level of differencing for the seasonal and non-seasonal structures.

Whether fitting is manual or automated, an ARIMA model may provide a good framework for modeling a time series, or it may be that alternative models or approaches provide a more satisfactory result. Often it is difficult to know in advance how good any given forecasting model is likely to be, since it is only in the light of its ability to predict future values of the data series that it can be truly judged. Often this process is approximated by fitting the model to past data excluding recent time periods (also known as hold-out samples), and then using the model to predict these known 'future' events, but even this offers only limited confidence in its future validity. Longer-term forecasting can be extremely unreliable using such methods. Clearly the international air traffic statistics model described above is not able to correctly predict passengers numbers through into the 1990s and beyond, nor the 5-year drop in US international airline passenger numbers post 9/11/2001 or the huge impact of COVID19 on passenger air traffic in 2020-2022. Likewise, an ARIMA model can be fitted to historic values of stock exchange prices or index values (e.g. the NYSE or FTSE indices) and will typically provide an excellent fit to the data (yielding an R-squared value of better than 0.99) but are often of little use for forecasting future values of these prices or indices.

Typically ARIMA models are used for forecasting, particularly in the field of macro- and micro-economic modeling. However, they can be applied in a wide range of disciplines, either in the form described here, or augmented with additional 'predictor' variables that are believed to improve the reliability of the forecasts made. The latter are important because the entire structure of the ARMA models discussed above depends on prior values and independent random events over time, not on any explanatory or causative factors. Hence ARIMA models will only reflect and extend past patterns, which might need to be modified in forecasts by factors such as the macro-economic environment, technology shifts, or longer term resource and/or environmental changes.


[BOX1] Box G E P, Jenkins G M (1968). Some recent advances in forecasting and control. Applied Statistics, 17(2), 91-109

[BOX2] Box, G E P, Jenkins, G M, Reinsel G C (1994) Time Series Analysis, Forecasting and Control. 3rd ed. Prentice Hall, Englewood Cliffs, NJ

[CHA1] Chatfield C (1975) The Analysis of Times Series: Theory and Practice. Chapman and Hall, London (see also, 6th ed., 2003)

[LJU1] Ljung G M, Box G E P (1978) On a Measure of a Lack of Fit in Time Series Models. Biometrika, 65, 297–303

NIST/SEMATECH e-Handbook of Statistical Methods, Section 6.4: Introduction to time series. 2010

SPSS/PASW 17 (2008) Analyze|Forecasting (Time Series Models)

[REI1] Reinsel G C Datasets for Box-Jenkins models: see