BookmarkSubscribeRSS Feed
bluesky
Fluorite | Level 6

A couple of questions (any help appreciated) -- 

 

1. Is there a way to change the threshold value that Studio uses to reject models?

 

Ideally, I'd want to get forecasts from all models considered i.e., have zero rejected models. 

 

I understand that it's possible to get the runner-up (not rejected but not the champion model either) models from outstatselect and it's further possible then to generate forecasts by feeding that selection list + model list (after creating the logical concatentation for the model repository) to HPFEngine and getting the forecasts.

 

But, to identify the rejected models, other than going into autolevmodrep to get all the models in the selection list and then matching that with the list in outstatselect to figure out which models were rejected, it doesn't look like there's any other way to resolve it? 

 

So, I was wondering if it's instead possible to simply change the threshold value (for whatever criterion one is using i.e., MASE, RMSE etc) that Studio uses to reject models? That way more (or all) of the models will be selected.

 

 

2. Is there a direct way to capture the forecasts for all the runner-up models in the Studio batch code?

 

Currently, it looks like the only way to do it is to get the runner-up models (from outstatselect) for each series, create the logical modelrep and then run HPFEngine to get the forecasts for each of the runner-up models. 

 

Depending on the number of series that one has to go through this process, it can take many hours to complete. So, I was wondering if there's a way to modify the HPFEngine code (as part of the Studio batch code) to get the forecast datasets for all the runner-up models? 

 

9 REPLIES 9
mitrov
SAS Employee

Hi

 

I don't think there is an alternative to the solutions you already outlined. 

 

May I ask what you what is the purpose? As you are probably aware, persisting all forecasts can result a humongous amount of data, which would not scale for large projects. Additionally, many models in the selection list are simply not run because they do not make sense, or results in failed forecasts or missing data. 

 

As you are also probably aware, you can combine different models, which should result in more robust forecasts if that helps your purposes. 

bluesky
Fluorite | Level 6

Thanks -- I am basically just trying to get a handle on the specific set of criteria that Studio uses to reject models. 

 

From your response though, I take it that a model could be rejected i.e., not be passed on to HPFEngine for any number of reasons and not only because it doesn't cross the specific criterion's threshold value. 

mitrov
SAS Employee

Yes, that is correct. Tests are run and models are generated to capture data features. Based on that some models are estimated and other are not. For example, intermittent demand models (IDM) are not fit to data that is not intermittent. 

 

There is one training class on Forecast Sever that goes more into the details of that should you be interested. 

mitrov
SAS Employee

I should have added that you need also to be aware that the models are first selected using the holdour sample data only. Then, the winning model is reestimated using the full sample data. The forecasts you see use the full sample data for model parameters estimation. 

To verify the model selection you would need the forecasts generated using the hold out sample data only. Those are not persisted. 

bluesky
Fluorite | Level 6

Thanks so much for all your help.

 

One other quick question -- in the GUI, when you click on any of the runner-up models, Studio provides the forecasts for that model.

 

Are those forecasts (for the runner-up models) calculated on the fly i.e., HPFEngine runs in the background and they get generated after the user clicks on them?

 

Or, are they already generated (as part of the project forecasting process) and stored some place and the GUI merely presents the results (i.e., the forecasts are not generated right then)?

 

I've tried to look at the batch code and the generated datasets but haven't been able to figure out an answer to this question.

 

My current understanding is that they seem to be generated on the fly (i.e., only when someone clicks on the runner-up model) but I could be wrong. 

mitrov
SAS Employee

Your understanding is correct. They are generated on the fly. 

frthesea
Calcite | Level 5

Hi, I have been wondering about the same question and glad found some discussions here. 

 

For me the motivation is to see if the forecast figures generated among the "top ranking models" from forecast studio makes sense, i.e. they shouldn't be too far off by either too optimistic or too pessimistic. Because to me when the MAPE is very similar, it makes sense to compare models from a non-statistical perspective. Hence it is of interest to actually have for instance the forecasts generated from top 5 models automatically generated instead of manually clicking and validating each time and waiting for the server to run. 

 

As far as I understood the forecast studio has SAS code behind, but I couldn't seem to find the part where I set another model, such as my customize model as the final forecasting model. And then there's the same question if I can have a list of the models for each series ranked by their fitting criteria such as in-sample MAPE. 

 

alexchien
Pyrite | Level 9

If you would like to see the model selection stats for all the models considered, you can go to the project folder (e.g. C:\SAS\Config\Lev1\AppData\SASForecastServer14.1\Projects\Pricedata), navigate down to \hierarchy\<level of interests> and take a look at OUTSTATSELECT table. The table contains all the stats you can image for all the candidate models considered for each series in that particular hierarchy level. The column _REGION_ could have values FIT and FORECAST. The row for FIT region is the in-sample stats and FORECAST region is for out-of-sample stats providing that the BACK option is set to something other than 0. I bet you can come up with lots of reports out of this single table. 

 

It is a bit tricky to programmatically choose a model from the candidate lists (this is what forecast studio is for :>). The system generated model candidate lists (i.e. the model repository) are stored as a SAS catalog called AUTOLEVMODREP in the same folder as OUTSTATSELECT table. This catalog should contain the specifications for each individual candidate model and a selection specifications for the candidate models of each series. Any manual selection or custom models defined via forecast studio are stored in the project level model repository PROJECTMODREP in the project folder and the corresponding selection specifications will be removed from the AUTOLEVMODREP model repository

 

Hope this helps a little.

Alex

alexchien
Pyrite | Level 9

One more note. If you take a look at the sas code generated by Forecast Studio (In the Forecast Studio project, go to project and select SAS Code...), you will see the following at the top of the code. Since the customized models are stored in the project level model repository, the project level repo has to be concatenated with the level specific repo to be used in the subsequent PROC calls. This feature is similar to concatenating SAS libnames.

 

*----------------------------------------------------------------------
* concatenate default hpf model repositry with the project.
*---------------------------------------------------------------------;
catname _project.ProjectModCombRep ( _project.ProjectModRep sashelp.hpfdflt (ACCESS=READONLY) );
catname _top.LevModRep( _top.AutoLevModRep _project.ProjectModCombRep );
catname _HPF0.LevModRep ( _HPF0.AutoLevModRep _project.ProjectModCombRep);
catname _HPF1.LevModRep ( _HPF1.AutoLevModRep _project.ProjectModCombRep);
catname _HPF2.LevModRep ( _HPF2.AutoLevModRep _project.ProjectModCombRep);

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1788 views
  • 4 likes
  • 4 in conversation