BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
I want to know how to set up forecasting options especially for holdout samples
which is shown in model selection of "set forecasting options".

After I checked use holdout sample, what specific number do I need to select
for 1)number of periods to use 2)maximum percentage of series to use?

Secondly, I wonder whether I should use UCM(Unobserved Component Model) for
showing the relationship between dependent and independent variables.
Simply put, I cannot identify multivariate using ARIMA or Exponential smoothing.

Thanx,
1 ACCEPTED SOLUTION

Accepted Solutions
udo_sas
SAS Employee
Hello -
Please excuse for not being clear.
You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."
The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."
As such you might want to have a look at
Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press
for more information.

Hope that helps,
Udo

View solution in original post

3 REPLIES 3
udo_sas
SAS Employee
Hello,
If you select "number of periods" you are setting the holdout option of the forecasting engine.
From documentation:
HOLDOUT= n
specifies the size of the holdout sample to be used for model selection. The holdout sample is a subset of actual time series ending at the last nonmissing observation. If the ACCUMULATE= option is specified, the holdout sample is based on the accumulated series. If the holdout sample is not specified, the full range of the actual time series is used for model selection. For each candidate model specified, the holdout sample is excluded from the initial model fit and forecasts are made within the holdout sample time range. Then, for each candidate model specified, the statistic of fit specified by the CRITERION= option is computed using only the observations in the holdout sample. Finally, the candidate model, which performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series. The HOLDOUT= option is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting. It is possible that one model will outperform another model in the holdout sample but perform less well when the entire range of the actual series is used. If MODEL=BESTALL and HOLDOUT= options are used together, the last one hundred observations are used to determine whether the series is intermittent. If the series determined not to be intermittent, holdout sample analysis will be used to select the smoothing model.


By selecting "2) maximum..." your are setting the HOLDOUTPCT option of the forecasting engine.
From documentation:
HOLDOUTPCT= number
specifies the size of the holdout sample as a percentage of the length of the time series. If HOLDOUT=5 and HOLDOUTPCT=10, the size of the holdout sample is where is the length of the time series with beginning and ending missing values removed. The default is 100 (100%).


With regards to your 2nd question: yes, UCM can be used to show the relationship between dependent and independent series. In fact, you will find that the output of UCM are easier to interpret than ARIMAX.

Regards,
Udo
deleted_user
Not applicable
Thank you for your explanation in detail.
To be honest with you, I don't understand clearly.
The documentation saying you wrote is somewhat difficult for me to understand. I wish it could be more easily translated.
The "holdout sample" indicates the most recent data : if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations.

But, this "Holdout sample" significantly affected the forecasting values, so I don't know how I can find out optimal Holdout number. what do you think the optimal holdout sample?
I just used the default value 5, 1 so please let me know if you think you know the right answer.

Message was edited by: hangsok

Message was edited by: hangsok Message was edited by: hangsok
udo_sas
SAS Employee
Hello -
Please excuse for not being clear.
You wrote: "if we excluded the holdout sample in this step, we would be ignoring the most recent and influential observations."
The holdout sample is only used to select the best forecasting model from a list of candidate models. After the best model is selected, the full range of the actual time series is used for subsequent model fitting and forecasting.

I'm afraid that there is no easy answer to your question about "optimal holdout sample size" as it depends on many things like granularity of the data, the amount of historic data at hand, attributes of the data (seaonal vs. non-seasonal), etc.

At the end of the day you want to use a holdout sample size which helps improving forecasting accuracy. This requires to understand both the data and the forecasting process (which includes monitoring accuracy over time). As such you might have to go through some iterations of this process and figure out how different settings of the holdout sample will impact your model accuracy.

In general I think it is a fair statement that SAS Forecast Server is based on best practices specified by Armstrong (ed.) in "Principles of Forecasting."
As such you might want to have a look at
Armstrong, J. S. (2001d), “Evaluating forecasting methods,” in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Press
for more information.

Hope that helps,
Udo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2323 views
  • 0 likes
  • 2 in conversation