Till recently your forecasting process in SAS Visual Forecasting (VF) would normally look like this.
First you would build forecasting pipelines in SAS VF UI using time series, machine learning (ML) and deep learning (DL) techniques. Some of those pipelines would be using global models, where all your time series are used to train one ‘global’ model (usually ML), some of them would be using local models, where modelling is performed at series-by-series level (usually time series statistical techniques or RNNs), and some others would be using a combination of both local and global methods. To make it simple for users, the necessary data transformations are automatically performed inside the different nodes in SAS VF .
After you successfully run your modeling nodes, a Model Comparison node selects the modeling strategy that performed best in the pipeline based on the criterion of your choice. If you had multiple pipelines running in parallel, then the best performing modeling strategy from all pipelines is selected.
But what if you wanted to run multiple models in parallel, including local and global models, and make sure that you will always choose the best performing model at a series-by-series level? That’s exactly the problem that the newly introduced ‘Ensemble’ node is addressing.
The Ensemble node evaluates the forecasts from the parent (predecessor) modeling nodes that are connected to it and selects the best statistically performing forecast for each time series based on the fit statistic selected by the user. If you want to modify your forecasts even further, you can attach an ‘Interactive Modeling’ node after the Ensemble so you can even apply custom models in your pipeline.
How does this look like in practice?
In the example below we see a forecasting pipeline where we use the Ensemble node as a post-processing node and we connect to it the following modeling strategies:
We choose Root Mean Squared Error (RMSE) as the metric of choice for the champion models to be selected per time series and we run the node.
When the node is run, we can examine the results as shown in the picture below. We see that the Auto-forecasting models were selected in ~80% of the cases. For the rest of the series the global models were selected.
We then proceed to examine the output tables. The ‘OUTWEIGHT’ table includes the value ‘1’ for the selected models for each of our time series. I’ve placed a check mark nest to some of them to showcase the logic in the picture below. While the weighting values can be either 1 or 0 for now, more sophisticated weighting methods are in development and expected to be available in the future releases.
Finally, we open the results from the Model Comparison node. As expected, the Ensemble node provides the most accurate results based on the criterion of choice. The improvement is significant as we see an ~8% decrease in the weighted statistics, WMAE and WRMSE, across all of the time series.
Some closing tips:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.