HI,
My problem: I need to build many ARIMA models and choose the ones with low MAPEs on Holdout samples, sort of like what Forecast Studio does.
I am running the following ARIMA statements in a loop, trying out various values of p, q and inputs:
Y is my response series, X is a input series I use to explain the Ys.
proc ARIMA data=Input_dataset;
Identify Var=Y crosscorr=X;
Estimate p=1 q=1 input =X;
Forecast id=DAte interval=months out=REsults Lead=12 back=12 nooutall;
quit;
proc sql;
select
mean(abs(residual)/Y)*100 as MAPE, mean((residual)/Y)*100 as MPE from results;
quit;
My understanding is that the Back=12 statement tells SAS not to use the final 12 datapoints for estimating the parameters and the lead=12 forecasts the values for these final 12 datapoint. Then the nooutall statement forces only the last 12 datapoints into the REsults dataset. The final Proc SQL statements uses these 12 actual and forecasted values to estimate the holdout MAPES.
So I my questions are:
Is the programming structure above creating a holdout sample?
If Not, what should I be doing to get what I am trying to accomplish ?
ARIMA does not have a BACK= option in the ESTIMATE statement. Therefore, the parameter estimation uses all the available data. The BACK= option in the FORECAST statement withholds the specified number of observations from the end of the data (that is, these observations are not used in forecasting calculations). It is easier to explain this with an example:
data d1;
set sashelp.air;
logair=log(air);
run;
proc arima data==d1;
i var=logair(1 12) noprint;
e q=(1)(12) noint method=ml;
f back=12 lead=24;
f lead=24;
quit;
proc ucm data=d1;
model logair;
irregular q=1 sq=1 s=12;
deplag lags=(1)(12) phi=1 1 noest;
estimate;
forecast back=12 lead=24;
run;
The ARIMA forecast output for the first forecast statement (with back=12) should be essentially the same as the UCM forecast output (with back=12 in the forecast statement). In ARIMA, the second forecast statement uses all the 144 measurements, whereas the first forecast statement uses the first 132 measurements.
Hope this helps.
Hello -
See my response to your other question: https://communities.sas.com/message/240126#240126 - the BACK option of the FORECAST statement will not provide you with access to hold-out sampling similar to SAS Forecast Studio. It is more along the lines of out-of-sample techniques.
Your code will note work as you will have to:
a) leave out the hold-out sample first
b) create many different models using the remaining data (initial fit data)
c) forecast all your models based on the initial fit data to forecast into the hold-out region
d) pick the winning model based on hold-out region fit
e) add the hold-out data back to the initial fit region data and re-estimate all parameters of the winning model based on all data
f) create a forecast using the final estimates
Thanks,
Udo
Many Thanks for your helpful comments, Udo!
But I still am not sure I understand what the back= option does in PROC ARIMA; From the manual:
"specifies the number of observations before the end of the data where the multistep forecasts are to begin. "
My question is: is the PROC using the data, including the data specified by the back= option to estimate the parameters? The manual only says it begins multistep forecasts from a prior period, nothing about the data being used or not used. Also, my experiments with various values of Back= gave the same parameter values, suggesting that the proc uses up all the data... In light of this, I am not sure I understand your comment on Back= specifying the out of sample part. Could you please elaborate?
As an aside, The Proc UCM manual is quite clear in its description of the Back= option
BACK=integer
specifies the holdout sample for the evaluation of the forecasting performance of the model. For example, BACK=10 results in treating the last 10 observed values of the response series as unobserved. A post-sample-prediction-analysis table is produced for comparing the predicted values with the actual values in the holdout period. The default is BACK=0.
Is this the case with PRoc ARIMA too?
Many Thanks in advance!
X
ARIMA does not have a BACK= option in the ESTIMATE statement. Therefore, the parameter estimation uses all the available data. The BACK= option in the FORECAST statement withholds the specified number of observations from the end of the data (that is, these observations are not used in forecasting calculations). It is easier to explain this with an example:
data d1;
set sashelp.air;
logair=log(air);
run;
proc arima data==d1;
i var=logair(1 12) noprint;
e q=(1)(12) noint method=ml;
f back=12 lead=24;
f lead=24;
quit;
proc ucm data=d1;
model logair;
irregular q=1 sq=1 s=12;
deplag lags=(1)(12) phi=1 1 noest;
estimate;
forecast back=12 lead=24;
run;
The ARIMA forecast output for the first forecast statement (with back=12) should be essentially the same as the UCM forecast output (with back=12 in the forecast statement). In ARIMA, the second forecast statement uses all the 144 measurements, whereas the first forecast statement uses the first 132 measurements.
Hope this helps.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.