This post describes the essentials of how ARIMAX models work and illustrates how to interpret their interpretable parts. The intention is to help analysts better understand their project’s generated models so they can effectively communicate results and make informed choices in setting forecast model related options. This post is the second in a series. The first post, link below, defined what a transfer function is, described how numerator orders are specified and interpreted and also introduced the error series component of the model via auto-regressive orders.
In this post, we’ll describe what orders of integration mean, how denominator orders work in a transfer function and consider the role of moving average terms in the error series model. Subsequent posts in this series focus on additional diagnostics that augment and extend the interpretability of models generated a SAS Visual Forecasting project.
Interpreting ARIMAX Models, Part 1
Denominator orders in a transfer function
As we described in the previous post, ARIMAX models quantify the relationship between an input and the dependent variable through a mechanism called a transfer function. Transfer functions consist of numerator and denominator orders. The numerator order approach for accommodating a relationship between an input and the target is to add some combination of current and lagged values of the input variable to the specification. Denominator orders in a transfer function do effectively the same thing, but they do it in a more parsimonious and restrictive way.
Let’s start by considering a transfer function with a numerator order 0 and denominator order 1. The y variable is the target at time t, and x represents an input at time t. At first glance, it looks like we’ve specified a simple, but weird looking, contemporaneous relationship between y and x.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
Denominator orders capture long lag, ‘dynamic’ relationships between the target and input, and the backshift operator is the key to understanding how they work. In the equation, B denotes the backshift operator, and here’s what it does; backshift operating on a variable at time t shifts it back one time interval. For example,
To make things more straight forward, multiply both sides of the equation above by the denominator and rearrange, which gives us the following.
To illustrate how denominator orders work, we’ll assume the relationship between x and y is a pulse/response. As shown below, the steady state value for x is 0. It switches or pulses to one for one time interval, t=3. The equation above determines the response of y.
Note that we could write the denominator order relationship equivalently as a large order, numerator component as follows.
If numerator and denominator orders are equivalent, why do we bother with having both? Note that in the denominator order specification, there are only two values that would need to be derived or estimated: the numerator order 0 parameter (here, 8), and the denominator order 1 parameter (here, 0.5). The numerator order representation of the same relationship has a lot more parameters. So, denominator orders can represent long lag relationships between an x and y relatively parsimoniously. However, denominator orders represent the relationship between x and y in a restrictive way. The relationship is plotted below.
Denominator orders are primarily useful when the response of the target variable, y, looks like a jump with decay back to a steady state level. Modeling hurricane effects on oil production in the Gulf of Mexico is one example of where denominator orders are useful. In the month the hurricane hits, oil production jumps down. In the months following, repairs are made, rigs come back on-line, and oil production gradually converges back to its steady state, pre-event level. Larger hurricanes tend to have longer lasting effects on oil production than smaller ones, and the length of effect is regulated by both the magnitude of the initial impact, quantified by the NUM 0 parameter, and the value of the DEN1 parameter. Note that the value of the denominator order parameter must be less than one in absolute value.
When would you not want to use a denominator order in a transfer function to model a relationship between y and x? Consider the following response pattern in y; there’s a build in the correlation pattern as well as a gap. The pattern of jump with decay would not be a good approximation for this relationship. Since numerator orders in a transfer function have a separate parameter for each lag of correlation persistence, they are more flexible and would be preferred to model the relationship pictured in the plot.
The highlighted row below shows how a model with the Price input entering as NUM=0 and DEN=1 is represented in the software.
The parameter estimates table shows both estimated parameters for the Price effect. Note, the NUM0 term with parameter estimate 12.1 is listed as SCALE.
Orders of integration, or the I in ARIMA
Non-stationarity means that the parameters that describe the time series are a function of time. Non-stationary variation causes problems with the correct specification and estimation of the ARMA and transfer function parts of the model if it is not handled appropriately. We’ll begin with an informal definition of what non-stationary variation is, and then describe a standard approach for handling it. To introduce this concept, we’ll break the data up into chunks. For the Toothpaste series, averages calculated in the two bracketed chunks are not substantially different from each other or from the overall series average. It looks like the series mean is not changing much as time increases, and that the data is probably stationary.
On the other hand, the series average for Passengers in the most recent chunk of data is substantially different from the average in the first chunk. It’s likely that the series mean is a function of time, and that the data is non-stationary.
Differencing is the most widely used approach, in the context of ARIMAX models, for handling non-stationary variation in time series data. Differencing can transform non-stationary data into stationary data. A first difference, denoted d subscript t below, usually suffices to remove non-stationary trend from the data.
The plot below shows the first differenced, de-trended passengers series.
If a first difference is used to transform the data from a non-stationary to stationary, then the order of integration, (the I in ARIMAX) of the model is 1. Integration is just a refined way to say, ‘add it back up’, and it describes what happens after the data is differenced, and the ARIMAX model is fit. If the data has trend, we want that trend to be represented in our forecast. The AR, MA and X components of the model are fit on the differenced, stationary data, so, initially, the predictions are on the differenced scale. The difference is then un-done on the predictions; they are ‘added up’ or integrated to get the trend component into the final forecast.
Readers with a time series background may be thinking that the first difference removed the trend from the data, but there’s still a pretty obvious seasonal pattern: the first differenced data is still non-stationary! Seasonal patterns can be handled with a difference too, but in the seasonal case we’ll use a seasonal span difference. For monthly data, this usually implies a 12-span difference, shown below.
For data with trend and seasonality, we can apply a first and a seasonal span difference to transform it to stationarity. In fact, there’s a classic Box-Jenkins Airline model for the Passengers data that includes a first and seasonal span difference.
The highlighted row shows how the classic Airline model is represented in the software.
The specification contains a first and seasonal (12) span difference, denoted with a D. The specification also contains moving average terms, denoted Q, at lags 1 and 12. The subscript s indicates a seasonal lag. We’ll conclude our discussion of basic, ARIMAX model interpretation with moving average terms, next. The final forecast from the Airline model, pictured below, illustrates how integrating, or undoing the first and 12 span differences, adds the trend and seasonal components to the predictions.
Moving average terms
If the dependent variable y is a stationary series, we can model it with a mixture of autoregressive (AR) and moving average (MA) terms, as shown below. Epsilon at time t represents the error, and epsilon at t-1 is a realization of the (white) noise error process. y could represent the residuals from a transfer function model, and in this context the equation represents the error series model described in the previous post.
Having both AR and MA terms allows us to capture the signal in the stationary data in parsimonious and flexible way. MA and AR terms perform similar roles in the model. From an interpretability point of view, that’s the necessary information to describe them.
Further, optional, details are provided next for interested readers. Using the backshift operator, we can re-write the equation above with all variables at the current time, t. Doing some rearranging shows that the ARMA model is a ratio of polynomials in the backshift operator.
This looks a lot like a transfer function with NUM and DEN orders. In fact, it is! The ARMA model is a transfer function with a white noise input. An AR1 term behaves identically to the DEN1 term described above. Numerator orders are the same as MA orders. MA terms a useful for capturing choppy and irregularly shaped signal or memory in the stationary data. AR terms are more parsimonious and are useful for capturing signal with the jump-with-decay pattern. I hope you got some value from this post. Stay tuned for the next one where we’ll start discussing and adding some custom interpretability diagnostics to the software.
Find more articles from SAS Global Enablement and Learning here.
... View more