BookmarkSubscribeRSS Feed

Machine Learning and Explainable AI in Forecasting - Part II

Started ‎02-23-2022 by
Modified ‎02-23-2022 by
Views 2,838

Explainability Methods in Forecasting

 

Table of Content

 

 

 

 

 

Introduction

 

In this second part of our blog series we focus on SHAP[1] as local explanation method and how to apply this method to the ABT table we created from the time series data. As we mentioned in part I the challenge we are facing is that this method works well when the variables we are using are independent from each other, so we have to find a way to adapt our methodology to overcome this issue.
 
But before we start, let's give you a short introduction into Shapley Values.

 

What are Shapley Values?

 
The concept of Shapley Values came from economists[2] for game theory. They tried to solve the problem of award distribution among multiple team members.
Award.gif

 

How to fairly attribute member's contribution? The solution by Lloyd Shapley satisfies the following properties:

  • EFFICIENCY: All individual awards should add up to the total earning
  • DUMMY: If including an individual brings no additional earning in any situation, then this individual should receive zero award
  • SYMMETRY: If including two individuals add the same amount of additional earnings, then they should receive the same award
  • ADDITIVITY: If including individual A inceases the earning by the same amount of two other individuals B and C, then A should receive the sum of B's and C's award.
The Shapley Value is the ONLY solution that satisfies all constraints! It is based on a weighted marginal contribution of a member among all possible coalitions.

But wait what is a coalition and what is a marginal contribution?
Solution1.PNG

 

 
What would be the weighted marginal contribution among all possible coalitions? Here is an example Shapley Value for member A:
Solution2.PNG

 

 

Written as formula:

 

formula.png

 

See: https://christophm.github.io/interpretable-ml-book/shapley.html 

 

In the formula above, p is the total number of members and S is the number of members in the coalition excluding the member of interest.

The weight is inversely proportional to the size of a coalition “group” where each “group” includes all coalitions with the same number of members.

So, in our example above we have 4 groups:
  • Group 1: Adding 0 other person, size 1 
  • Group 2: Adding 1 other person, size 3 
  • Group 3: Adding 2 people, size 3  
  • Group 4: Adding 3 people, size 1 

Each group ends up in having the same total weight of 1/4 and all weights add up to 1.

 
This approach can be transferred to explain the prediction for a (local) observation. Each feature value of the observation is a member in a game where the prediction is the award.

The calculation of Shapley Values is computationally expensive as it requires the evaluation of the model with all possible coalitions/combinations of features. There are faster approximation methods available, like SHAP[1].
 

This SHAP method is implemented in the SAS action linearExplainer which is one action of the Explain Model action set.

A good explanation of the SHAP method can be found in the book Interpretable Machine Learning.

In the next section we will explain how to adapt the linearExplainer action for time series data.

Using the linearExplainer Action for Time Series Data

 

The standard KERNELSHAP preset implementation of the SAS action is following these steps:

  1. Pick a single observation (query)
  2. Generate random observations by sampling from each variable's distribution separately
  3. Apply the model score code that was generated by a previous step to the new observations
  4. Weight the observations based on their coalitions
  5. Run a weighted linear regression on model's prediction
  6. Interpret the linear regression model coefficients  

Because of the way we built our analytical base table (ABT) - transferring it from transactional to one row per subject - our features are not independent from each other. For details, please refer to part I of our blog series. When we would apply step 2 - generating random observations - the dependency among the features would get lost.

To preserve the dependency structure, it is possible to suppress the random sampling process 😊. Here is an example code:
proc cas;
explainModel.linearExplainer result=shapr / table = {name='PRICEDATA_ID', caslib='PUBLIC'}
 query = {name='QUERY', caslib='CASUSER'}
 modelTable = {name='GB_PRICEDATA_MODEL_ID', caslib='MODELS'}
 modelTableType = 'ASTORE'
 predictedTarget = 'P_sale'
 seed =1234
 preset = 'KERNELSHAP'
 dataGeneration = {method='None'}
inputs= {{name = "sale_lag3"},
{name = "sale_lag2"},
{name = "sale_lag1"},
{name = "discount"},
{name = "price"}}
;
run;
So, by adding the line of code "dataGeneration = {method='None'}", random sampling will be suppressed and the model's score code will be applied to the original observations.

This preserves the feature dependencies and let you explain the prediction of machine learning models like Gradient Boosting or other tree based algorithms like LightGBM in our Forecasting case.
However, please note that 
 
  • the accuracy depends on how well the original data cover the coalitions,
  • the Shapley values of highly correlated features may bleed into each other,
  • this method can be seen as approximation of the Shapley coalition/cohort values in [3].

Note: If you are interested in a global explanation of your machine learning model for time series data, you can just adapt the preset parameter to 'GLOBALREG' to create a surrogate model for a global explanation of your model.

 
In our third and last part of this blog series, we will show how to explain forecasting models globally and locally in an application.
 
See you in part III!

References

 
[2] Shapley, Lloyd S. “A value for n-person games.” Contributions to the Theory of Games 2.28 (1953): 307-317
 

 

 
 
 
Version history
Last update:
‎02-23-2022 10:37 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags