BookmarkSubscribeRSS Feed

A new way to combine local and global models in your forecasting process

Started ‎01-29-2024 by
Modified ‎01-29-2024 by
Views 1,122

Till recently your forecasting process in SAS Visual Forecasting (VF) would normally look like this.

 

First you would build forecasting pipelines in SAS VF UI using time series, machine learning (ML) and deep learning (DL) techniques. Some of those pipelines would be using global models, where all your time series are used to train one ‘global’ model (usually ML), some of them would be using local models, where modelling is performed at series-by-series level (usually time series statistical techniques or RNNs), and some others would be using a combination of both local and global methods. To make it simple for users, the necessary data transformations are automatically performed inside the different nodes in SAS VF .

 

After you successfully run your modeling nodes, a Model Comparison node selects the modeling strategy that performed best in the pipeline based on the criterion of your choice. If you had multiple pipelines running in parallel, then the best performing modeling strategy from all pipelines is selected.

 

But what if you wanted to run multiple models in parallel, including local and global models, and make sure that you will always choose the best performing model at a series-by-series level? That’s exactly the problem that the newly introduced ‘Ensemble’ node is addressing.

 

The Ensemble node evaluates the forecasts from the parent (predecessor) modeling nodes that are connected to it and selects the best statistically performing forecast for each time series based on the fit statistic selected by the user. If you want to modify your forecasts even further, you can attach an ‘Interactive Modeling’ node after the Ensemble so you can even apply custom models in your pipeline.

 

How does this look like in practice?

 

In the example below we see a forecasting pipeline where we use the Ensemble node as a post-processing node and we connect to it the following modeling strategies:

 

  1. An Auto-forecasting node, which includes various local models such as ARIMA, Exponential Smoothing and Intermittent Demand models
  2. A Panel-Series Neural Network – global model
  3. A Gradient-Boosting model – global model
SpirosP_1-1706516473039.png

 

We choose Root Mean Squared Error (RMSE) as the metric of choice for the champion models to be selected per time series and we run the node.

 

When the node is run, we can examine the results as shown in the picture below. We see that the Auto-forecasting models were selected in ~80% of the cases. For the rest of the series the global models were selected.

 

SpirosP_2-1706516511704.png

 

We then proceed to examine the output tables. The ‘OUTWEIGHT’ table includes the value ‘1’ for the selected models for each of our time series. I’ve placed a check mark nest to some of them to showcase the logic in the picture below. While the weighting values can be either 1 or 0 for now, more sophisticated weighting methods are in development and expected to be available in the future releases.   

 

SpirosP_5-1706516591309.png

 

Finally, we open the results from the Model Comparison node. As expected, the Ensemble node provides the most accurate results based on the criterion of choice. The improvement is significant as we see an ~8% decrease in the weighted statistics, WMAE and WRMSE, across all of the time series.

 

SpirosP_6-1706516630172.png

 

Some closing tips:

  • Running local and global models in parallel for all your time-series can be computationally intensive sometimes. You may also want to consider segmenting your data, using the External Segmentation or Demand Classification templates, and apply different modeling strategies to the corresponding segments.
  • Results from the Ensemble node can be arbitrary in terms of the modeling techniques selected in case they score equally for the selected fit statistic.
  • Remember you can use an Interactive Modeling node after the Ensemble node to further optimize your results at the individual time series level.
  • Finally, try using the Distributed Open Source Code node as a competing modeling strategy in your forecasting pipelines, and then use the Ensemble node to make source that you always get the best results from both SAS and Open Source algorithms.  
Version history
Last update:
‎01-29-2024 03:28 AM
Updated by:
Contributors

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags