Machine Learning and Explainable AI in Forecasting - Part I

3 Likes

Machine Learning in Forecasting

Table of Content

Introduction

Machine learning techniques in forecasting have proven their added value for a while now. What about explainability though? In this article series we’ll try to take you through a forecasting journey using Machine Learning techniques and we’ll focus on a powerful solution that we developed in SAS Viya [1] to interpret the results of Machine Learning algorithms for each individual prediction, as well as how the explanatory variables impact our forecasts overall when using a machine learning method.

Spoiler Alert! Explainable AI in forecasting is not as straightforward as it seems due to the dependency between the variables that are used and the nature of the state-of-the art explainability methods which are available, which at their purest form provide robust results but only under the assumption of variable independency. That issue is exactly what we managed to tackle in this case! But before we go into further details and discuss our approach, we firstly need to take a step back and discuss some key topics:

The rise of ML techniques in forecasting
Machine Learning Techniques vs Traditional Forecasting Methods
Focusing on improving the data pays off big time

So let’s start…

The Rise of Machine Learning Techniques in Forecasting

More and more companies turn their eyes to Machine Learning techniques to solve forecasting problems. Machine Learning and hybrid techniques in forecasting grew in popularity after winning many reputable forecasting competitions including Kaggle ones such as the M5, the Corporación Favorita Grocery Sales Forecasting and the Recruit Restaurant Visitor Forecasting.

The most preferred algorithm by the teams that achieved the top accuracy scores was LightGBM as it’s doesn’t require much data preparation, it can handle various features and types and it’s generally faster compared with other gradient boosting methods [2]. LightGBM is available in SAS Viya and it supports parallel, distributed, and GPU learning. If you want to give it a try then have a look at this resource [3].

Other tree-based methods that are worth trying are the Random Forest and Gradient Boosting Machine that come with autotuning, which is the capability of finding the optimal hyperparameters in an automated way using genetic and other algorithms. These ML methods, as well as LightGBM, are all available with Visual Data Mining and Machine Learning [4] in SAS Viya.

Apart from the techniques in SAS VDMML that we mentioned above, SAS Viya comes with a product which is specifically developed for forecasting purposes called SAS Visual Forecasting (VF) [5] . VF also comes with three different machine learning and hybrid techniques which are based on Neural Networks and are tailored to solve forecasting problems. They are definitely worth discussing as they simplify data preparation (all features and transformations needed are done automatically inside the nodes) and provide highly accurate results as they are designed with forecasting in mind. However, we won’t focus on these techniques in this article but if you want to have a better look then this paper [6] from SAS Global Forum includes all the information you’ll need.

That said, VF should always be used in combination with VDMML and the reasons will be made clearer in the upcoming sections.

Machine Learning Techniques vs Traditional Forecasting Methods

Before discussing the methodology in more detail, we need to understand when we should consider applying ML methods to solve forecasting problems instead of applying well-established forecasting statistical techniques such as ARIMA and Exponential Smoothing (ESM). My advice would always be the same. Start simple and then build on that.

So let’s clarify the following regarding machine learning techniques in forecasting:

They should not be used for all forecasting projects and purposes as simpler statistical techniques may be more appropriate (and more accurate!) in many different cases as it was proven in various forecasting competitions.

They introduce a higher level of complexity to the project where a data scientist should decide if they want to apply the algorithm in a recursive way to forecast many points in the future or build separate models for each different period they want to forecast. These two methods have their own benefits and in M5 competition it was proven that recursive methods tend to be more accurate while the latter methods tend to be more robust. As a result a high level of expertise is required from data science teams to effectively take the right decisions and design a reliable system.

ML methods generally perform better when we are dealing with interrelated series as the algorithms can learn from dense series and apply this knowledge to series where data is sparse. This means that we should decide how to group the data first in a meaningful way prior of applying the ML techniques.

They generally perform better when we have many meaningful explanatory variables available.

When ML methods don’t outperform simple statistical techniques and we are still unsatisfied with our forecasting results then we should also consider using hybrid methods.

Effective data transformations to create ML modeling ready tables (ABTs) from transactional data are key to achieve good results.

The last point is crucial for succeeding here. For that reason we want to provide some hints & tips on that topic.

Focusing on improving the data pays off big time

The overall idea of what we need to take into account when we are developing a modeling ready table (ABT) for forecasting purposes can be summarised in the picture below:

Now let’s take the above picture and break it down a little bit:

As a first step the analyst should account for creating/selecting categorical variables to include in the predictive models from the data attributes. This could be any variables of a potential hierarchy (region/distribution centre/store), considering not to include vars with too many levels as this may lead to overfitting problems. We could also try to include variables for competing time series such as in the cases of substitute products.

Then we move on to developing variables which will encapsulate information from seasonality and trend. This is easy when it comes to creating variables for day, week, month etc. but if we want to go one step further, we could use forecasting techniques from SAS Visual Forecasting to develop forecasts for the higher levels of a hierarchy and then include there forecasts as explanatory variables at the level of hierarchy where we would like to create our final predictions using ML techniques.

Finally, when it comes to denoising the data and searching for short-term patterns, data scientists have many different options. One option could be to smooth, remove or flag unexplained outliers, while ‘proc expand’ in SAS can be a powerful ally in creating ‘lag’, 'lead' and ‘moving average’ variables in a very simple way. For the documentation of ‘prox expand’ check the resource here [7].

To bring what we described above to life with a simple example check the code below which uses 'pricedata' (which can be found in ‘sashelp’ library). 'Pricedata' includes the sales of different products by region, line and product-name and we’ll also keep two explanatory variables; price and discount.

/* Keep the variables we want for our analysis */
/* Sale is going to be the variable we would like to forecast*/
/* Price and Discount are explanatory/independent variables */

data pricedata_transform;
	set sashelp.pricedata;
	keep date sale price1 discount regionName productLine productName;
	rename price1=price regionname=region;
run;

/* Create lags and moving average of variables with proc expand */

proc expand data=pricedata_transform out=out method=none;
   id date;
   by region productline productname;

   convert sale = sale_lag3   / transformout=(lag 3);
   convert sale = sale_lag2   / transformout=(lag 2);
   convert sale = sale_lag1   / transformout=(lag 1);
   convert sale;

   convert price = price_lag3   / transformout=(lag 3);
   convert price = price_lag2   / transformout=(lag 2);
   convert price = price_lag1   / transformout=(lag 1);
   convert price;

   convert discount = discount_lag3   / transformout=(lag 3);
   convert discount = discount_lag2   / transformout=(lag 2);
   convert discount = discount_lag1   / transformout=(lag 1);
   convert discount;

   convert price = price_movave_3m / transformout=(movave 3);
   convert sale = sale_movave_3m / transformout=(movave 3);

run;

/* Create a 'month' dummy variable to capture seasonality and
	a unique_id to identify each unique combination 
	of region, productline, productname and date          */

data pricedata_ml;
	set out;
	month=month(date);
	uniqueID = _n_;
run;

/* Load data in CAS for the following parts */

proc casutil;
	load data=pricedata_ml outcaslib="public"
	casout="pricedata_ml" promote;
run;

If you run the code above the final data set should look like the below where each row is a unique combination of region, productLine, productName and date. Now we have a basic ABT where we could apply our ML techniques on.

(view in My Videos)

What’s next

The process we described above could guide you in formulating your problems in structured manner but there are many decisions that have to be made along the way and those will be project-specific.

If you want to dig deeper in the topics we described above maker sure to check the latest courses from SAS Education department on forecasting and especially the one called ‘Models for Time Series and Sequential Data’ [8] which discusses all these topics in a much more detail and also includes coding examples in advanced areas such as recursive modelling with ML techniques and hybrid methods.

Now that we got an idea of how to prepare our data to apply machine learning methods in forecasting problems, we’ll see in the next blog how we can explain the results of these models using state-of-the-art techniques that are modified in a specific way to tackle variable dependency. These techniques offer great transparency and are crucial to build trust with business stakeholders so they can adopt and use our highly sophisticated models to take informed decisions and generate value in the business.

See you in Part II !