Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- SAS Forecasting Studio - What is the holdout sample and how to set up ...

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-30-2008 10:09 AM
(2187 views)

I want to know how to set up forecasting options especially for holdout samples

which is shown in model selection of "set forecasting options".

After I checked use holdout sample, what specific number do I need to select

for 1)number of periods to use 2)maximum percentage of series to use?

Secondly, I wonder whether I should use UCM(Unobserved Component Model) for

showing the relationship between dependent and independent variables.

Simply put, I cannot identify multivariate using ARIMA or Exponential smoothing.

Thanx,

which is shown in model selection of "set forecasting options".

After I checked use holdout sample, what specific number do I need to select

for 1)number of periods to use 2)maximum percentage of series to use?

Secondly, I wonder whether I should use UCM(Unobserved Component Model) for

showing the relationship between dependent and independent variables.

Simply put, I cannot identify multivariate using ARIMA or Exponential smoothing.

Thanx,

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello -

Please excuse for not being clear.

You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."

The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."

As such you might want to have a look at

Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press

for more information.

Hope that helps,

Udo

Please excuse for not being clear.

You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."

The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."

As such you might want to have a look at

Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press

for more information.

Hope that helps,

Udo

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

If you select "number of periods" you are setting the holdout option of the forecasting engine.

From documentation:

*HOLDOUT= n *

specifies the size of the holdout sample to be used for model selection. The holdout sample is a subset of actual time series ending at the last nonmissing observation. If the ACCUMULATE= option is specified, the holdout sample is based on the accumulated series. If the holdout sample is not specified, the full range of the actual time series is used for model selection. For each candidate model specified, the holdout sample is excluded from the initial model fit and forecasts are made within the holdout sample time range. Then, for each candidate model specified, the statistic of fit specified by the CRITERION= option is computed using only the observations in the holdout sample. Finally, the candidate model, which performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series. The HOLDOUT= option is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting. It is possible that one model will outperform another model in the holdout sample but perform less well when the entire range of the actual series is used. If MODEL=BESTALL and HOLDOUT= options are used together, the last one hundred observations are used to determine whether the series is intermittent. If the series determined not to be intermittent, holdout sample analysis will be used to select the smoothing model.

By selecting "2) maximum..." your are setting the HOLDOUTPCT option of the forecasting engine.

From documentation:

*HOLDOUTPCT= number *

specifies the size of the holdout sample as a percentage of the length of the time series. If HOLDOUT=5 and HOLDOUTPCT=10, the size of the holdout sample is where is the length of the time series with beginning and ending missing values removed. The default is 100 (100%).

With regards to your 2nd question: yes, UCM can be used to show the relationship between dependent and independent series. In fact, you will find that the output of UCM are easier to interpret than ARIMAX.

Regards,

Udo

If you select "number of periods" you are setting the holdout option of the forecasting engine.

From documentation:

specifies the size of the holdout sample to be used for model selection. The holdout sample is a subset of actual time series ending at the last nonmissing observation. If the ACCUMULATE= option is specified, the holdout sample is based on the accumulated series. If the holdout sample is not specified, the full range of the actual time series is used for model selection. For each candidate model specified, the holdout sample is excluded from the initial model fit and forecasts are made within the holdout sample time range. Then, for each candidate model specified, the statistic of fit specified by the CRITERION= option is computed using only the observations in the holdout sample. Finally, the candidate model, which performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series. The HOLDOUT= option is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting. It is possible that one model will outperform another model in the holdout sample but perform less well when the entire range of the actual series is used. If MODEL=BESTALL and HOLDOUT= options are used together, the last one hundred observations are used to determine whether the series is intermittent. If the series determined not to be intermittent, holdout sample analysis will be used to select the smoothing model.

By selecting "2) maximum..." your are setting the HOLDOUTPCT option of the forecasting engine.

From documentation:

specifies the size of the holdout sample as a percentage of the length of the time series. If HOLDOUT=5 and HOLDOUTPCT=10, the size of the holdout sample is where is the length of the time series with beginning and ending missing values removed. The default is 100 (100%).

With regards to your 2nd question: yes, UCM can be used to show the relationship between dependent and independent series. In fact, you will find that the output of UCM are easier to interpret than ARIMAX.

Regards,

Udo

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your explanation in detail.

To be honest with you, I don't understand clearly.

The documentation saying you wrote is somewhat difficult for me to understand. I wish it could be more easily translated.

The "holdout sample" indicates the most recent data : if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations.

But, this "Holdout sample" significantly affected the forecasting values, so I don't know how I can find out optimal Holdout number. what do you think the optimal holdout sample?

I just used the default value 5, 1 so please let me know if you think you know the right answer.

Message was edited by: hangsok

Message was edited by: hangsok Message was edited by: hangsok

To be honest with you, I don't understand clearly.

The documentation saying you wrote is somewhat difficult for me to understand. I wish it could be more easily translated.

The "holdout sample" indicates the most recent data : if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations.

But, this "Holdout sample" significantly affected the forecasting values, so I don't know how I can find out optimal Holdout number. what do you think the optimal holdout sample?

I just used the default value 5, 1 so please let me know if you think you know the right answer.

Message was edited by: hangsok

Message was edited by: hangsok Message was edited by: hangsok

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Please excuse for not being clear.

You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."

The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."

As such you might want to have a look at

Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press

for more information.

Hope that helps,

Udo

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.