BookmarkSubscribeRSS Feed
jinghan1029
Fluorite | Level 6

In general, a time series forecast depends the initial values or the period to start the forecast. Suppose I have a AR(3) model estimated with the sample between 2010 Jan and 2020 Dec. Does proc ARIMA uses the first three months: 2010 Jan, 2010 Feb and 2010 March as the initial condition to generate the forecast and roll forward or use the last three months: 2020 Oct, 20220 Nov and 2020 Dec?

5 REPLIES 5
ballardw
Super User

Start with https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/etsug/etsug_arima_overview.htm

 

In the overview

The ARIMA procedure analyzes and forecasts equally spaced univariate time series data, transfer function data, and intervention data by using the autoregressive integrated moving-average (ARIMA) or autoregressive moving-average (ARMA) model. An ARIMA model predicts a value in a response time series as a linear combination of its own past values, past errors (also called shocks or innovations), and current and past values of other time series.

jinghan1029
Fluorite | Level 6

Thanks. I came here from online help document with confused head. For example, this is the equation shown in the page "Forecast details"

 

   x^t+k=∑i=1k−1π^ix^t+k−i+∑i=k∞π^ixt+k−i

 

However, it is unclear what is the first period the program starts generating forecasts. I also notice there is an option "align" with feasible values being BEGINNING/MIDDLE/END, but without an specification on how they are defined. 

 

Because the iterative nature of the forecasts from time series, the forecasts will differ if we use different starting points.

sbxkoenk
SAS Super FREQ

Hello,

 

> In general, a time series forecast depends the initial values or the period to start the forecast. + your question.

You are absolutely right!
And I have known this (I think it's clearly stated in one of our SAS/ETS courses how PROC ARIMA deals with this "initialization for forecast generation") but I do not dare to make a definitive statement on it anymore.

It depends on the estimation method:

  • METHOD=CLS (infinite memory forecasts, also called conditional forecasts)
  • METHOD=ULS or METHOD=ML (finite memory forecasts, also called unconditional forecasts)

What method do you use? By default, METHOD=CLS (at least in SAS 9.4 Maintenance Level 7).

 

I think in your AR(3) model example, the CLS forecast starts with 2010 Jan, 2010 Feb and 2010 March as the initial condition to generate the forecasts as it assumes that the unknown values of the response series before the start of the data are equal to the mean of the series.

I note also this paragraph in the doc.
A complete description of the steps to produce the series forecasts and their standard errors by using either of these methods is quite involved, and only a brief explanation of the algorithm is given in the next two sections. Additional information about the finite and infinite memory forecasts can be found in Brockwell and Davis (1991). The prediction of stationary ARMA processes is explained in Chapter 5, and the prediction of nonstationary ARMA processes is given in Chapter 9 of Brockwell and Davis (1991).

I will follow-up this question and maybe consult some internal resources on it.

 

Kind regards,
Koen

sbxkoenk
SAS Super FREQ

On top of my previous reply (see above!!) ...

 

The ALIGN= option is used to align the ID variable to the beginning, middle, or end of the time ID interval specified by the INTERVAL= option.
Suppose your interval is HOUR or DTHOUR, then 07h09 becomes 07h00 (align=beginning) or 07:29:59 (align=middle) or something close to 08h00 (align=ending).

 

Look at the dataset WANT in this example and play with beginning | middle | end :

PROC IML;
phi = {1 -0.5};
theta = {1 0.8};
y = armasim(phi, theta, 0, 1, 46, -1234321);
*print y;
create temp from y;
append from y;
close temp;
QUIT;

data have;
 set temp(rename=(col1=y));
 datetimehour=INTNX('dthour','20AUG2021:14:00:00'dt,_N_);
 format datetimehour datetime18.;
run;

proc arima data=have;
  identify var=y;
  estimate p=3;
  FORECAST ALIGN=BEGINNING /* MIDDLE | ENDING */ LEAD=5
           ID=datetimehour INTERVAL=DTHOUR  
           NOPRINT out=want;
quit;
/* end of program */

Good luck,
Koen

jinghan1029
Fluorite | Level 6

Hi, Koen:

 

Thank you so much. 

 

Even after reading the explanation on what "conditional forecast" and "unconditional forecast" are, I still cannot tell how the results will differ.

 

I think it is necessary for us to understand how SAS generates the forecast. From the practitioner's perspective, we usually starting the forecast from the most recent observations (last observations in the estimation sample). The forecast will be very different if SAS actually starts from the earliest observations used in estimation. If we don't understand the difference, we will misinterpret the results.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 806 views
  • 3 likes
  • 3 in conversation