turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Forecasting
- /
- SAS Forecasting Studio - What is the holdout sampl...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-30-2008 10:09 AM

I want to know how to set up forecasting options especially for holdout samples

which is shown in model selection of "set forecasting options".

After I checked use holdout sample, what specific number do I need to select

for 1)number of periods to use 2)maximum percentage of series to use?

Secondly, I wonder whether I should use UCM(Unobserved Component Model) for

showing the relationship between dependent and independent variables.

Simply put, I cannot identify multivariate using ARIMA or Exponential smoothing.

Thanx,

which is shown in model selection of "set forecasting options".

After I checked use holdout sample, what specific number do I need to select

for 1)number of periods to use 2)maximum percentage of series to use?

Secondly, I wonder whether I should use UCM(Unobserved Component Model) for

showing the relationship between dependent and independent variables.

Simply put, I cannot identify multivariate using ARIMA or Exponential smoothing.

Thanx,

Accepted Solutions

Solution

08-09-2017
03:13 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-11-2008 04:37 AM

Hello -

Please excuse for not being clear.

You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."

The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."

As such you might want to have a look at

Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press

for more information.

Hope that helps,

Udo

Please excuse for not being clear.

You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."

The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."

As such you might want to have a look at

Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press

for more information.

Hope that helps,

Udo

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

07-30-2008 11:55 AM

Hello,

If you select "number of periods" you are setting the holdout option of the forecasting engine.

From documentation:

*HOLDOUT= n *

specifies the size of the holdout sample to be used for model selection. The holdout sample is a subset of actual time series ending at the last nonmissing observation. If the ACCUMULATE= option is specified, the holdout sample is based on the accumulated series. If the holdout sample is not specified, the full range of the actual time series is used for model selection. For each candidate model specified, the holdout sample is excluded from the initial model fit and forecasts are made within the holdout sample time range. Then, for each candidate model specified, the statistic of fit specified by the CRITERION= option is computed using only the observations in the holdout sample. Finally, the candidate model, which performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series. The HOLDOUT= option is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting. It is possible that one model will outperform another model in the holdout sample but perform less well when the entire range of the actual series is used. If MODEL=BESTALL and HOLDOUT= options are used together, the last one hundred observations are used to determine whether the series is intermittent. If the series determined not to be intermittent, holdout sample analysis will be used to select the smoothing model.

By selecting "2) maximum..." your are setting the HOLDOUTPCT option of the forecasting engine.

From documentation:

*HOLDOUTPCT= number *

specifies the size of the holdout sample as a percentage of the length of the time series. If HOLDOUT=5 and HOLDOUTPCT=10, the size of the holdout sample is where is the length of the time series with beginning and ending missing values removed. The default is 100 (100%).

With regards to your 2nd question: yes, UCM can be used to show the relationship between dependent and independent series. In fact, you will find that the output of UCM are easier to interpret than ARIMAX.

Regards,

Udo

If you select "number of periods" you are setting the holdout option of the forecasting engine.

From documentation:

specifies the size of the holdout sample to be used for model selection. The holdout sample is a subset of actual time series ending at the last nonmissing observation. If the ACCUMULATE= option is specified, the holdout sample is based on the accumulated series. If the holdout sample is not specified, the full range of the actual time series is used for model selection. For each candidate model specified, the holdout sample is excluded from the initial model fit and forecasts are made within the holdout sample time range. Then, for each candidate model specified, the statistic of fit specified by the CRITERION= option is computed using only the observations in the holdout sample. Finally, the candidate model, which performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series. The HOLDOUT= option is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting. It is possible that one model will outperform another model in the holdout sample but perform less well when the entire range of the actual series is used. If MODEL=BESTALL and HOLDOUT= options are used together, the last one hundred observations are used to determine whether the series is intermittent. If the series determined not to be intermittent, holdout sample analysis will be used to select the smoothing model.

By selecting "2) maximum..." your are setting the HOLDOUTPCT option of the forecasting engine.

From documentation:

specifies the size of the holdout sample as a percentage of the length of the time series. If HOLDOUT=5 and HOLDOUTPCT=10, the size of the holdout sample is where is the length of the time series with beginning and ending missing values removed. The default is 100 (100%).

With regards to your 2nd question: yes, UCM can be used to show the relationship between dependent and independent series. In fact, you will find that the output of UCM are easier to interpret than ARIMAX.

Regards,

Udo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to udo_sas

08-08-2008 02:18 PM

Thank you for your explanation in detail.

To be honest with you, I don't understand clearly.

The documentation saying you wrote is somewhat difficult for me to understand. I wish it could be more easily translated.

The "holdout sample" indicates the most recent data : if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations.

But, this "Holdout sample" significantly affected the forecasting values, so I don't know how I can find out optimal Holdout number. what do you think the optimal holdout sample?

I just used the default value 5, 1 so please let me know if you think you know the right answer.

Message was edited by: hangsok

Message was edited by: hangsok Message was edited by: hangsok

To be honest with you, I don't understand clearly.

The documentation saying you wrote is somewhat difficult for me to understand. I wish it could be more easily translated.

The "holdout sample" indicates the most recent data : if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations.

But, this "Holdout sample" significantly affected the forecasting values, so I don't know how I can find out optimal Holdout number. what do you think the optimal holdout sample?

I just used the default value 5, 1 so please let me know if you think you know the right answer.

Message was edited by: hangsok

Message was edited by: hangsok Message was edited by: hangsok

Solution

08-09-2017
03:13 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-11-2008 04:37 AM

Please excuse for not being clear.

You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."

The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."

As such you might want to have a look at

Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press

for more information.

Hope that helps,

Udo