Solved: is it necessary to have training, validation and test sets for forecas...

Jade_SAS · Posted 11-15-2017 11:09 AM

Hi All,

I have a quick question:

Is it necessary to separate the original data sets to training set, validation set and test set when doing the forecasting in forecast studio? What's the general practice here? Thank you!

Thanks,

Jade

alexchien · Posted 11-17-2017 05:40 PM

Hi Jade, Forecast Studio supports utilizing the validation and test data set to evaluate the model performance. You can set HOLDOUT (number of periods) or HOLDOUT_PCT (percent of total periods) to use validation data set to diagnose and select models. You can use BACK (number of periods) to use test data set to evaluate true model performance as the test data is not used in any way during the modeling process. In data mining world, the training data is partitioned randomly (or in a fashion that is not time related) to form the training, validation, and test data. Models will be built based on the training data and guarded by the validation data. That's the end of modeling process. However, in forecasting, the data has to be partitioned by time sequences since you are not building a model to forecast any random period in the past. The model is built to forecast into the future periods in sequence. The test data has to be the most recent observations, and then the validation data, and then the training data. You might lose the recency effect (most recent data are typically important for forecasting) if you are holdout out data for validation purpose (or the test data). In Forecast Studio, the validation data set (via the HOLDOUT option) will be used for diagnose to create model candidate list. Then the training + validation data will be used to select the best model from the model candidate list, and generate forecasts. This is the most common practice. However, If the data has long enough of history, you do have the luxury to compare models with the test data (via the BACK option). But i would use the BACK for reporting the expected model performance, and then set the BACK to 0 and generate forecasts using the selected model in order to utilize the latest data.

sorry for the long reply... have a nice weekend

alex

View solution in original post

Reeza · Posted 11-15-2017 11:16 AM

Yes it's necessary and yes it's standard procedure when doing predictive modeling to split it three ways.

Ksharp · Posted 11-16-2017 08:37 AM

@Reeza It is Forecast model , not Predict model( generally exist in Data Minding ).

Jade_SAS · Posted 11-16-2017 03:02 PM

Yes, I am asking for the forecasting models whether it's a general practice to have the training, validation and test sets. Thank you!

Reeza · Posted 11-16-2017 03:19 PM

Do you have a model already built that you're forecasting or are you building a model?

If you're building a model the current standard is the three way split.This is more of an industry standard than a SAS rule.

Here's a video from Coursera that describes why this is done. If you're a reader there's the text transcript below the video.

https://www.coursera.org/learn/machine-learning/lecture/QGKbr/model-selection-and-train-validation-t...

Jade_SAS · Posted 11-17-2017 08:49 AM

I am building a model. Thank you!

alexchien · Posted 11-17-2017 05:40 PM

Hi Jade, Forecast Studio supports utilizing the validation and test data set to evaluate the model performance. You can set HOLDOUT (number of periods) or HOLDOUT_PCT (percent of total periods) to use validation data set to diagnose and select models. You can use BACK (number of periods) to use test data set to evaluate true model performance as the test data is not used in any way during the modeling process. In data mining world, the training data is partitioned randomly (or in a fashion that is not time related) to form the training, validation, and test data. Models will be built based on the training data and guarded by the validation data. That's the end of modeling process. However, in forecasting, the data has to be partitioned by time sequences since you are not building a model to forecast any random period in the past. The model is built to forecast into the future periods in sequence. The test data has to be the most recent observations, and then the validation data, and then the training data. You might lose the recency effect (most recent data are typically important for forecasting) if you are holdout out data for validation purpose (or the test data). In Forecast Studio, the validation data set (via the HOLDOUT option) will be used for diagnose to create model candidate list. Then the training + validation data will be used to select the best model from the model candidate list, and generate forecasts. This is the most common practice. However, If the data has long enough of history, you do have the luxury to compare models with the test data (via the BACK option). But i would use the BACK for reporting the expected model performance, and then set the BACK to 0 and generate forecasts using the selected model in order to utilize the latest data.

sorry for the long reply... have a nice weekend

alex

Jade_SAS · Posted 11-18-2017 11:04 AM

Thank you, Alex!!!
Have a nice weekend!

ccaulkins9 · Posted 11-21-2017 11:36 AM

models are a part of Data Mining

e-SAS regards,

is it necessary to have training, validation and test sets for forecast when using Forecast Studio?

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Re: is it necessary to have training, validation and test sets for forecast when using Forecast Stud

Ready to join fellow brilliant minds for the SAS Hackathon?