BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GuyTreepwood
Obsidian | Level 7

Hello,

 

I recently fit a a random forest model, and created Shapley values for the scored records (using the HyperSHAP method). In the output, it displays a Shapley value for the intercept. Is there an intuitive way to interpret what the intercept value means in this context? Explaining how the other variables contribute to the prediction is straight forward, but I am stumped when it comes to the intercept (especially if it is a negative value, like my example below). 

 

This is the code I ran to create the Shapley values table:

 

proc cas;
loadactionset "explainModel";
explainModel.shapleyExplainer / table = "Input_table"
query = "score_sample_1"
modelTable = {caslib="casuser",
name = "Stored_Model_v1"}
modelTableType = "ASTORE"
predictedTarget = "P_target1"
inputs = {&inputlst}
outputTables = {names= {"ShapleyValues" = "HyperSHAP_Stats"}}
depth = 1
;
run;

quit;

 

Here is the Shapley values output from the code above:

 

VariableShapleyValue
Intercept-0.08110334
Var_1-0.011890304
Var_2-0.006514712
Var_3-0.002237833
Var_4-0.002140751
Var_5-0.002024026
Var_6-0.001596565
Var_7-0.000849501
Var_8-0.00028321
Var_90.00062736
Var_100.006623832
Var_110.014028848
Var_120.016726198
Var_130.026388227
Var_140.028782221
Var_150.036830496
Var_160.045865352
Var_170.066095825

 

1 ACCEPTED SOLUTION

Accepted Solutions
xinhunt
SAS Employee

The intercept value of Shapley values can be seen as a "default prediction" from the model when the model doesn't know any feature/variable values. I'd like to think of it this way: For each variable, its Shapley value is this particular variable's contribution to this prediction. However, when we don't give any information about any variable, a model can still predict something about the outcome. For example, if we were asked to predict a person's resting heart rate without giving any information about the person, we can still come up with an educated guess (somewhere between 60-80) because that's what people's resting heart rate tends to be. This "model's guess without particular information about the observation" is the Shapley intercept value. If we know more about the person (gender, age, health conditions) then we can adjust our guess based on that information, and those adjustments are the Shapley values for those features/variables.

Typically the intercept should be equal or close to the mean of the predictions of the reference dataset (the data in the table parameter), though there are cases when this is not true because HyperSHAP (and KernelSHAP, or any methods estimating Shapley values) is an estimation of the true Shapley value and there could be errors in the estimation.

View solution in original post

3 REPLIES 3
gcjfernandez
SAS Employee

Please review this SAS publication, https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4502-2020.pdf  where Shapley value estimation and interpretation are presented with examples.

GuyTreepwood
Obsidian | Level 7

Thanks for the link. While the paper contains useful information, it doesn't have any references to the SHAP intercept values, and how to interpret them. 

xinhunt
SAS Employee

The intercept value of Shapley values can be seen as a "default prediction" from the model when the model doesn't know any feature/variable values. I'd like to think of it this way: For each variable, its Shapley value is this particular variable's contribution to this prediction. However, when we don't give any information about any variable, a model can still predict something about the outcome. For example, if we were asked to predict a person's resting heart rate without giving any information about the person, we can still come up with an educated guess (somewhere between 60-80) because that's what people's resting heart rate tends to be. This "model's guess without particular information about the observation" is the Shapley intercept value. If we know more about the person (gender, age, health conditions) then we can adjust our guess based on that information, and those adjustments are the Shapley values for those features/variables.

Typically the intercept should be equal or close to the mean of the predictions of the reference dataset (the data in the table parameter), though there are cases when this is not true because HyperSHAP (and KernelSHAP, or any methods estimating Shapley values) is an estimation of the true Shapley value and there could be errors in the estimation.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2939 views
  • 5 likes
  • 3 in conversation