Improving Your Generated Forecasts in SAS Visual Forecasting; Part 1, Event Variables

SAS Visual Forecasting (VF) is an automated, large-scale forecasting solution. It can automatically generate time series models, select a champion model for each series and then generate forecasts at scale. Users can generate good forecasts for hundreds of thousands of time series using established best practices by simply providing the software some information about the data and then running their project. However, there is a lot of functionality in SAS Visual Forecasting that is not turned on by default. This additional functionality is useful for modifying and refining the automated forecasting system, and it will be the focus of this series of blogs.

The purpose of this series of blogs is to introduce non-default, SAS VF functionality in the context of Model Studio projects. After the initial project is set up and run, analysts begin looking for ways to improve forecast precision. Our focus will be on introducing and describing VF functionality that enables analysts to leverage their knowledge of the data into the algorithms that do the model generation, selection and forecasting to improve overall forecast precision. Event variables are the first addition to the default functionality we’ll discuss.

This series of blogs will assume readers have some basic knowledge about how Model Studio projects and pipelines are created and run. Readers new to VF projects can find some foundational background here; https://video.sas.com/category/videos/sas-forecasting SAS Education also offers a class that covers all the fundamentals, and you can sign up for the course here; https://learn.sas.com/course/view.php?id=562

What do event variables do and why are they useful?

Event variables are inputs or features in time series models, and they are used to capture variation associated with events. Events manifest themselves in shocks or ‘bangs’ in the data. Consider the following simple example where a one interval shock impacts the time series a time T*. There’s a large miss or residual on the date that the event occurs. This large miss can also bias the prediction or best fit line going forwards.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Typically, event variables play the role of intercept shifters in the model, and they work a lot like a light switch. Event variables are commonly coded as a column of zeros and ones in the data. The D variable shown flags the date of the event with a one. Since the event only persists for one interval, all other dates have an associated value of zero.

Adding the event variable D to the model allows the intercept to shift up and down as a function of the date or index. At T*, the intercept is equal to the sum of mu and delta (to be consistent with the picture, the delta parameter would be a negative number), and at all other dates the intercept is mu.

Accommodating the event related variation with an event variable results in a much smaller residual and a less biased forecast going forward. Accommodating the effects of longer-lived events can be accomplished by changing the definition of the event variable, that is, by modifying the column of zeros and ones.

Event Variables in a Large-Scale Forecasting Project

Events like regulatory, policy and other structural changes in an industry can impact the majority of series in a forecasting project, and the same event can have different effects on different subsets of series. In a manufacturing context, subsets of series react differently to an event due to the mix of inputs required, supply chain dependencies, and so on.

Event variables were introduced above in the context of a single series; the data was visually assessed, and an appropriate event variable was created. In a large-scale forecasting project, there’s not enough time and resources to visually assess each series and manually create appropriate event variables for each one. The goal now is to add event variables to a project to improve the performance of the generated models, but we still want the automated algorithms to do the majority of the work.

The COVID pandemic provides a recent example to motivate the usefulness of event variables in a large-scale forecasting project. The pandemic started in March, 2020. In 2023, a large manufacturer considered the impact of the pandemic on the sales of goods it produces. For goods that are simple to produce and that are not distributed internationally, the pandemic effect lasted about a year. For more complex goods that require multiple stages of production and international distribution, the pandemic effect tended to last longer. To successfully capture the COVID effect on affected series, several COVID event variables with different lengths of persistence were created. The event variables were introduced into the forecasting project as candidate input variables. Algorithms picked the best fit representation of the COVID event for each individual series in the model generation process. For series that weren’t substantially impacted by COVID, the event variables were ignored. The rest of this blog outlines the syntax and steps that were followed to implement this strategy.

Creating a library of event variables

There are a few ways to create a library of event variables that can be used in a VF forecasting project. For this example, we’ll start with a SAS 9, Forecast Server procedure; HPFEVENTS. In the syntax shown below, the EVENTDEF statement defines new event variables.

The first EVENTDEF statement names the event variable COVID_9 and sets it equal to the date of initial occurrence.
In SAS, the / means options to follow. The LS type defines the event variable as a level shift or step. So far, we have a permanent step event variable that switches from zero to one in March 2020.
The AFTER option truncates the step 9 intervals after the initial interval.
Applied to monthly data, COVID_9 is an event variable that switches from 0 to 1 in March 2020. It’s value stays at 1 until December 2020, and it switches back to 0 in January 2021.

Event variables live in SAS data sets in SAS 9. The EVENTDATA statement reads the defined event variables into the EVENTDAT table in the LOCAL library.

proc hpfevents;
   eventdef covid_9 = '01MAR2020'd / type=ls after=(duration=9);
   eventdef covid_12 = '01MAR2020'd / type=ls after=(duration=12);
   eventdef covid_16 = '01MAR2020'd / type=ls after=(duration=16);
   eventdef covid_20 = '01MAR2020'd / type=ls after=(duration=20);
   eventdef covid_24 = '01MAR2020'd / type=ls after=(duration=24);
   eventdata out=local.eventdat;
run;

This syntax and description provide a brief introduction to the HPFEVENTS functionality. Further details can be found in the SAS Forecast Server Procedures User's Guide. Documentation can be accessed at https://support.sas.com/en/documentation.html

Because we want to use the EVENTDAT table as a library of event variables in a SAS VF project, it needs to be loaded into memory and then promoted. In the syntax shown below;

• The CAS statement creates a connection to a CAS session.

• The CASLIB statement lists and enables us to access available CAS libraries.

• The DATA step makes an in-memory copy of the SAS 9, EVENTDAT table and loads it into the PUBLIC CASLIB.

• The CASUTIL procedure is used to promote the EVENTDAT table. Promotion of an in-memory or CAS table makes it globally accessible to other in-memory tools on the platform.

cas;
caslib _all_ assign;

data public.eventdat;
   set local.eventdat;
run;

proc casutil incaslib='public';
   promote casdata='eventdat';
run;

A portion of the EVENTDAT table is shown.

It’s important to note that the COVID related event variables are not columns of zeros and ones. The event variables we created in HPFEVENTS are rules for creating columns of zeros and ones that can be applied to time series of any length or interval. This library of event variables is portable across VF projects.

Using a library of event variables in a SAS VF project

Now, we’ll load the in-memory EVENTDAT table into an existing VF project in Model Studio. The VF project xxx_JUN24 was created and run under default settings. The fit measures below provide a baseline. They are aggregated measures associated with automatically generated and selected forecast models. Results correspond to series that represent sales of manufactured goods in a given hierarchy level of a manufacturing dataset.

To bring our library of candidate event variables into the project, we’ll navigate to the Data tab of the project and select New Data Source menu, and then Events.

Because the EVENTDAT table was loaded into memory and promoted in previous steps, it’s listed as an in-memory table under the Available tab.

Once the event variables are loaded into the project, we’ll change their usage status to Try to use. This tells the model generation algorithms to handle the event variables as candidate input variables; if one or more event variables improve the fit of a model, they will be selected as an input. If not, they will be ignored.

After re-running the project, the aggregated fit measures associated with the champion models have improved.

About 10% of the generated forecast models at this level of the data hierarchy contain at least one of the candidate event variables.

While the overall fit improved, there’s potential for further improvement by refining our definitions of the COVID effects. The event variables created in HPFEVENTS characterized the COVID effect with an abrupt shift down in the intercept followed by an abrupt shift back up to the pre-COVID status quo at the end of the defined duration. A more reasonable characterization is an abrupt step down followed by a gradual transition back to the pre-COVID status quo for most of the series. Multiple intercept shifts can be accommodated by defining additional event variables with different start dates. A RAMP event type is also available to capture more gradual transition patterns. Hopefully, this blog has provided a straight-forward example of how you can use event variables to leverage your knowledge of the business and its data into VF projects and improve the precision of your forecasts.

Improving Your Generated Forecasts in SAS Visual Forecasting; Part 1, Event Variables

Registration is open

SAS AI and Machine Learning Courses