turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Forecasting
- /
- Holdouts in Proc ARIMA

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-01-2014 02:37 AM

HI,

My problem: I need to build many ARIMA models and choose the ones with low MAPEs on Holdout samples, sort of like what Forecast Studio does.

I am running the following ARIMA statements in a loop, trying out various values of p, q and inputs:

Y is my response series, X is a input series I use to explain the Ys.

proc ARIMA data=Input_dataset;

Identify Var=Y crosscorr=X;

Estimate p=1 q=1 input =X;

Forecast id=DAte interval=months out=REsults Lead=12 back=12 nooutall;

quit;

proc sql;

select

mean(abs(residual)/Y)*100 as MAPE, mean((residual)/Y)*100 as MPE from results;

quit;

My understanding is that the Back=12 statement tells SAS not to use the final 12 datapoints for estimating the parameters and the lead=12 forecasts the values for these final 12 datapoint. Then the nooutall statement forces only the last 12 datapoints into the REsults dataset. The final Proc SQL statements uses these 12 actual and forecasted values to estimate the holdout MAPES.

So I my questions are:

Is the programming structure above creating a holdout sample?

If Not, what should I be doing to get what I am trying to accomplish ?

Accepted Solutions

Solution

08-09-2017
02:29 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xman

12-02-2014 02:55 PM

ARIMA does not have a BACK= option in the ESTIMATE statement. Therefore, the parameter estimation uses all the available data. The BACK= option in the FORECAST statement withholds the specified number of observations from the end of the data (that is, these observations are not used in forecasting calculations). It is easier to explain this with an example:

data d1;

set sashelp.air;

logair=log(air);

run;

proc arima data==d1;

i var=logair(1 12) noprint;

e q=(1)(12) noint method=ml;

f back=12 lead=24;

f lead=24;

quit;

proc ucm data=d1;

model logair;

irregular q=1 sq=1 s=12;

deplag lags=(1)(12) phi=1 1 noest;

estimate;

forecast back=12 lead=24;

run;

The ARIMA forecast output for the first forecast statement (with back=12) should be essentially the same as the UCM forecast output (with back=12 in the forecast statement). In ARIMA, the second forecast statement uses all the 144 measurements, whereas the first forecast statement uses the first 132 measurements.

Hope this helps.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xman

12-01-2014 01:22 PM

Hello -

See my response to your other question: https://communities.sas.com/message/240126#240126 - the BACK option of the FORECAST statement will not provide you with access to hold-out sampling similar to SAS Forecast Studio. It is more along the lines of out-of-sample techniques.

Your code will note work as you will have to:

a) leave out the hold-out sample first

b) create many different models using the remaining data (initial fit data)

c) forecast all your models based on the initial fit data to forecast into the hold-out region

d) pick the winning model based on hold-out region fit

e) add the hold-out data back to the initial fit region data and re-estimate all parameters of the winning model based on all data

f) create a forecast using the final estimates

Thanks,

Udo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to udo_sas

12-02-2014 03:14 AM

Many Thanks for your helpful comments, Udo!

But I still am not sure I understand what the back= option does in PROC ARIMA; From the manual:

"specifies the number of observations before the end of the data where the multistep forecasts are to begin. "

My question is: is the PROC using the data, including the data specified by the back= option to estimate the parameters? The manual only says it begins multistep forecasts from a prior period, nothing about the data being used or not used. Also, my experiments with various values of Back= gave the same parameter values, suggesting that the proc uses up all the data... In light of this, I am not sure I understand your comment on Back= specifying the out of sample part. Could you please elaborate?

As an aside, The Proc UCM manual is quite clear in its description of the Back= option

**BACK= integer**

specifies the holdout sample for the evaluation of the forecasting performance of the model. For example, BACK=10 results in treating the last 10 observed values of the response series as unobserved. A post-sample-prediction-analysis table is produced for comparing the predicted values with the actual values in the holdout period. The default is BACK=0.

Is this the case with PRoc ARIMA too?

Many Thanks in advance!

X

Solution

08-09-2017
02:29 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xman

12-02-2014 02:55 PM

ARIMA does not have a BACK= option in the ESTIMATE statement. Therefore, the parameter estimation uses all the available data. The BACK= option in the FORECAST statement withholds the specified number of observations from the end of the data (that is, these observations are not used in forecasting calculations). It is easier to explain this with an example:

data d1;

set sashelp.air;

logair=log(air);

run;

proc arima data==d1;

i var=logair(1 12) noprint;

e q=(1)(12) noint method=ml;

f back=12 lead=24;

f lead=24;

quit;

proc ucm data=d1;

model logair;

irregular q=1 sq=1 s=12;

deplag lags=(1)(12) phi=1 1 noest;

estimate;

forecast back=12 lead=24;

run;

The ARIMA forecast output for the first forecast statement (with back=12) should be essentially the same as the UCM forecast output (with back=12 in the forecast statement). In ARIMA, the second forecast statement uses all the 144 measurements, whereas the first forecast statement uses the first 132 measurements.

Hope this helps.