BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bmquee
Calcite | Level 5

I've read this thread: https://communities.sas.com/t5/Forecasting-and-Econometrics/Out-of-sample-range-vs-holdout-sample/m-...

and am wondering, what happens if you select both:

Model Select -> Use holdout sample for model selection

Forecast -> Calculate statistics of fit over an out-of-sample range

for the same range?

 

Would it ever make sense to do so?

1 ACCEPTED SOLUTION

Accepted Solutions
udo_sas
SAS Employee

Hello Bonnie -

First of all thanks for using SAS Forecast Server.

 

The purpose of using out-of-sample data is to assess the predictive power of your champion model for a specific series _before_ the actual values arrive. The purpose of using holdout data is to pick the champion model on data which was not used for initial estimation of parameters.

 

Here is an example - when using both out-of-sample and holdouts:

  • let's say you have monthly data for 5 years (Jan 2010 - Dec 2014) and you want to forecast 12 periods (lead time = 12).
  • specify out-of-sample range - this will always be considered first - if you specify an out-of-sample of 12, then Jan 2014 - Dec 2014 will be "set aside". The most recent data will be used for out-of-sample.
  • specify hold out sample: if you specify 12, then Jan 2013 - Dec 2013 will be "set aside".
  • initial fit region will be Jan 2010 - Dec 2012 - all parameters of all your models in your model repository will be estimate on initial fit region. Since lead time is 12, an initial forecast will be created into the hold out region. The best performing model will be picked using your preferred accuracy statistic based on the performance in your hold out region.
  • After the champion is selected, the holdout region gets merged back to the initial fit region, now the fit range is Jan 2010 -Dec 2013.
  • The parameters of the champion model will be re-estimated based on Jan 2010 - Dec 2013 data - a forecast will be generated for Jan 2014 - Dec 2014, which is your out-of-sample region.
  • now can can assess how well your champion would perform "in real life" - as you have both actual and forecast values.
  • if you are happy with the performance, you will have to turn off the out-of-sample region for production purposes, otherwise you will not predict the future (unless your lead time is longer than you out-of-sample region).
  • by turning of out-of-sample, now your holdout region becomes Jan 2014 - Dec 2014 of course (since you don't have out-of-sample data anymore) - all forecast values will be for future periods. In order to assess performance you will have to wait until time moves on and actual values arrive.

 

I'm hoping I'm addressing your question. Otherwise you may have to provide an example of what you are asking for.

Thanks,

Udo 

View solution in original post

4 REPLIES 4
udo_sas
SAS Employee

Hello -

Yes, it can make sense to select both out-of-sample and holdout samples, but you cannot define them for the same range (at least this statement is true for SAS Forecast Server).

I'm attaching some slides which I presented at SAS Global Forum 2009 - they may be useful understanding the 2 concepts and why you cannot define them for the same range.

Thanks,

Udo

bmquee
Calcite | Level 5

Thanks for your response Udo. To clarify, I am using SAS forecast server exclusively. I do understand the difference between the 2 concepts - using the holdout sample affects the model generation process, while the out of sample process does not. When using only the holdout sample, the statistics reported are for the holdout sample and the entire sample. When using only out of sample process, the statistics reported are for the in-sample and out of sample. But, when you use both the holdout sample and out-of-sample for the same range, the statistics reported are for the holdout sample and out-of-sample  I'm not sure what this means - I would think the holdout sample and out-of-sample would be the same if they are defined for the same range, as they are in this case. But the value of the statistics is different, which leads me to believe that you cannot specify both the holdout sample and out-of-sample for the same range. However, I still do not understand why they cannot be defined over the same range in SAS forecast server.

I really appreciate any clarification you can provide!

Thanks,

Bonnie

udo_sas
SAS Employee

Hello Bonnie -

First of all thanks for using SAS Forecast Server.

 

The purpose of using out-of-sample data is to assess the predictive power of your champion model for a specific series _before_ the actual values arrive. The purpose of using holdout data is to pick the champion model on data which was not used for initial estimation of parameters.

 

Here is an example - when using both out-of-sample and holdouts:

  • let's say you have monthly data for 5 years (Jan 2010 - Dec 2014) and you want to forecast 12 periods (lead time = 12).
  • specify out-of-sample range - this will always be considered first - if you specify an out-of-sample of 12, then Jan 2014 - Dec 2014 will be "set aside". The most recent data will be used for out-of-sample.
  • specify hold out sample: if you specify 12, then Jan 2013 - Dec 2013 will be "set aside".
  • initial fit region will be Jan 2010 - Dec 2012 - all parameters of all your models in your model repository will be estimate on initial fit region. Since lead time is 12, an initial forecast will be created into the hold out region. The best performing model will be picked using your preferred accuracy statistic based on the performance in your hold out region.
  • After the champion is selected, the holdout region gets merged back to the initial fit region, now the fit range is Jan 2010 -Dec 2013.
  • The parameters of the champion model will be re-estimated based on Jan 2010 - Dec 2013 data - a forecast will be generated for Jan 2014 - Dec 2014, which is your out-of-sample region.
  • now can can assess how well your champion would perform "in real life" - as you have both actual and forecast values.
  • if you are happy with the performance, you will have to turn off the out-of-sample region for production purposes, otherwise you will not predict the future (unless your lead time is longer than you out-of-sample region).
  • by turning of out-of-sample, now your holdout region becomes Jan 2014 - Dec 2014 of course (since you don't have out-of-sample data anymore) - all forecast values will be for future periods. In order to assess performance you will have to wait until time moves on and actual values arrive.

 

I'm hoping I'm addressing your question. Otherwise you may have to provide an example of what you are asking for.

Thanks,

Udo 

bmquee
Calcite | Level 5

Thank you Udo! That answers my question exactly and was a very lucid example. Thanks again!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4167 views
  • 0 likes
  • 2 in conversation