Good evening all,
I am learning SAS as part of my Masters course. I am guessing I may be trying to do something very stupid -- Econometrics is new to me, as well as SAS. I may just need someone to tell me. I tried searching but could not find anything.
I generated an ARMA(1,1) model (after running PROC ARIMA IDENTIFY with MINIC to work out the best option) with some data we were given for returns on the NY stock exchange. I realise stock returns will never be perfect, and the data we have includes a bit of a crash in 2009, but anyway... I wanted to try and improve on the model. I don't see any lags that would indicate a yearly, or quarterly, lag option I should put in. What I thought of doing was using an impulse intervention to handle the 2009 drop:
DATA TRAINING_ARIMA;
SET TRAINING;
Drop2009= (date >= '19FEB2009'd and date <= '05MAR2009'd);
RUN;
However, whenever I try and forecast [I have 196 obs in my training set; 22 remaining in my test set]:
PROC ARIMA DATA=TRAINING_ARIMA;
IDENTIFY VAR=NYSE CROSSCORR=(Drop2009);
ESTIMATE P=(1) Q=(1) INPUT=(Drop2009);
FORECAST OUT=FORECAST_ARIMA LEAD=22;
RUN;
I get the warning message:
Warning: More values of input variable Drop2009 are needed.
The value for option LEAD= has been reduced to 0.
and thus no forecasts.
Is this because I do not have enough "1" values for the Drop2009 column in my dataset (3 out of 196 currently as above; data is weekly)? Or am I missing some option in my code I need?
Thanks for your time,
Ian.
Hi Ian,
To forecast a model with input variables in PROC ARIMA, future values of those input variables need to be provided for the forecast horizon defined by the LEAD= option. The following section of the PROC ARIMA documentation provides some information on this topic:
Since your input variable is an intervention variable, future values of your Drop2009 variable need to be provided as part of your DATA= data set for the LEAD= horizon. Based on your description, you have a weekly data set with 218 observations that has been broken up into a Training data set with 196 observations and a Test data set with 22 observations. It appears the weekly dates are recorded on Thursday for each week.
Rather than breaking up the original data set into 2 data sets, you can use a modified version of your original data set, where your dependent variable, NYSE, is set to missing for the last 22 observations, and the Drop2009 variable is added. For example, assume your original data set is called ALL_DATA and the first observation in your "Test" data set starts in the first week of October 2012. You can do something like the following to fit your model and obtain forecasts:
data model_data;
set all_data;
if date > '30sep2012'd then nyse=.; /* specify date for beginning of forecast period */
drop2009 = ('19feb2009'd <= date <= '05mar2009'd);
run;
proc arima data=model_data;
identify var=nyse crosscorr=(drop2009) noprint;
estimate p=(1) q=(1) input=(drop2009);
forecast out=forecast_arima lead=22 id=date interval=week.5;
run;
quit;
Note that the ID=DATE and INTERVAL=WEEK.5 options were added to the FORECAST statement to include extrapolated values of the DATE variable in the OUT= data set. The INTERVAL= specification assumes weekly data aligned to the Thursday of each week. If this assumption is not correct, then you can change the INTERVAL= specification accordingly.
I hope this helps!
DW
Hi Ian,
To forecast a model with input variables in PROC ARIMA, future values of those input variables need to be provided for the forecast horizon defined by the LEAD= option. The following section of the PROC ARIMA documentation provides some information on this topic:
Since your input variable is an intervention variable, future values of your Drop2009 variable need to be provided as part of your DATA= data set for the LEAD= horizon. Based on your description, you have a weekly data set with 218 observations that has been broken up into a Training data set with 196 observations and a Test data set with 22 observations. It appears the weekly dates are recorded on Thursday for each week.
Rather than breaking up the original data set into 2 data sets, you can use a modified version of your original data set, where your dependent variable, NYSE, is set to missing for the last 22 observations, and the Drop2009 variable is added. For example, assume your original data set is called ALL_DATA and the first observation in your "Test" data set starts in the first week of October 2012. You can do something like the following to fit your model and obtain forecasts:
data model_data;
set all_data;
if date > '30sep2012'd then nyse=.; /* specify date for beginning of forecast period */
drop2009 = ('19feb2009'd <= date <= '05mar2009'd);
run;
proc arima data=model_data;
identify var=nyse crosscorr=(drop2009) noprint;
estimate p=(1) q=(1) input=(drop2009);
forecast out=forecast_arima lead=22 id=date interval=week.5;
run;
quit;
Note that the ID=DATE and INTERVAL=WEEK.5 options were added to the FORECAST statement to include extrapolated values of the DATE variable in the OUT= data set. The INTERVAL= specification assumes weekly data aligned to the Thursday of each week. If this assumption is not correct, then you can change the INTERVAL= specification accordingly.
I hope this helps!
DW
Hi DW,
That helps immensely; thank you so much.
Regards,
Ian.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.