The intercept value of Shapley values can be seen as a "default prediction" from the model when the model doesn't know any feature/variable values. I'd like to think of it this way: For each variable, its Shapley value is this particular variable's contribution to this prediction. However, when we don't give any information about any variable, a model can still predict something about the outcome. For example, if we were asked to predict a person's resting heart rate without giving any information about the person, we can still come up with an educated guess (somewhere between 60-80) because that's what people's resting heart rate tends to be. This "model's guess without particular information about the observation" is the Shapley intercept value. If we know more about the person (gender, age, health conditions) then we can adjust our guess based on that information, and those adjustments are the Shapley values for those features/variables.
Typically the intercept should be equal or close to the mean of the predictions of the reference dataset (the data in the table parameter), though there are cases when this is not true because HyperSHAP (and KernelSHAP, or any methods estimating Shapley values) is an estimation of the true Shapley value and there could be errors in the estimation.
... View more