As computer power has increased, more and more complex machine learning models are feasible. The advantage of these models is that they can be highly accurate predictors. But a downside is that it can be difficult to explain how the results were achieved. In some cases, we may value accuracy over interpretability. In my opinion, self-driving cars or cybersecurity are domains where accuracy is highly valued. In other industries, like banking and finance, interpretability is valued, and regulations may even require a certain level of interpretability.
What if we could combine the best of both worlds? Use highly complicated and highly accurate models, but find ways to help interpret them? LIME, ICE, variable importance plots, and partial dependency plots all aim to help us interpret complex models.
Let’s define some terms. First, recall that there are many terms used to mean inputs and outputs.
The terms “black box” and “white box” are used to refer to less or more transparent models. There is generally a trade-off between interpretability and accuracy.
Results from “white box” (transparent) models can be easier to explain and interpret. The math from the independent variables may be relatively simple and it is often easy to see which independent variables are the most important in determining the dependent variable. However, we may relinquish some model accuracy.
Results from black box (opaque) models commonly have complex transformations. It may be hard to visualize and understand what is going on inside these models and it is usually difficult to communicate how why an individual record was scored as it was. However, the model results may be highly accurate.
A few examples of each type are shown below:
To break this down, I’ll use an analogy of the caipirinha (“white box”), versus a key lime pie (“black box”).
If we have a caipirinha, we can pretty much guess the ingredients:
By looking and tasting, we can even venture a good guess about how much of each ingredient is included, and with one sip we can tell which ingredient seems to be dominant. Likewise, with linear regression or decision trees, we can fairly easily interpret our results. We can see which input variable is most important in determining our outcome.
This is analogous to our caipirinha. If we add too much cachaça, for example, we can’t find our car keys. Which is definitely for the best. Call Uber or Lyft. Hypothetically, I mean. But we know it was the cachaça and not the lime that created that outcome.
In contrast, let’s consider a key lime pie.
We happen to know that the ingredients for this key lime pie include:
But would we know this just by looking, or even taking a bite? Would we have even guessed that there are eggs are in the pie? Can we tell how much butter versus condensed milk versus sour cream versus whipping cream? What proportion of our slice is eggs? It’s very hard to tell, because the ingredients have been beaten, mixed and baked.
If our pie is the perfect flavor, but too soupy, what do we have to change? Less condensed milk? Less sour cream? Less whipping cream? Beat more? Beat less? It’s tricky.
But what if we could have our key lime pie and eat it, too?
There are some ways to improve the interpretability of black box models, including:
Commonly, for interpretation, we are trying to explain the connection between inputs and outputs. For example, if someone is denied a loan, we may want to know what input factors were most important in influencing that denial.
Three methods that are model agnostic and visual and can be used to compare models are Variable Importance Rankings, LIME (Local Interpretable Model Agnostic Explanations), and ICE (Individual Conditional Expectation) plots. Partial Dependency Plots can also help us interpret complex models.
LIME, ICE Plots, and Partial Dependency Plots for model interpretability were added to in all supervised modeling nodes in Model Studio for VDMML 8.2.
Variable importance graphs:
Partial dependence plots show how values of model inputs affect the model’s predictions. A partial dependence plot in its simplest form shows how a single input (one independent variable) is related to the outcome (the dependent variable). This is illustrated above in the graph of manufacturer's suggested retail price (MSRP) by horsepower (from Ray Wright’s 2018 SGF paper Interpreting Black-Box Machine Learning Models Using Partial Dependence and Individual Conditional E...).
Both Partial Dependence and ICE plots are post hoc methods. They show how the model behaves in response to changing inputs.
CAUTION! The simplest partial dependency plot may not be meaningful if there are significant interaction effects among independent variables. Multi-way partial dependence plots can help you check for interactions.
ICE plots let you see visually how the inputs (independent variables) are related to the outcome (dependent variable). ICE curves can be understood as a “simulation that shows what would happen to the model’s prediction if you varied” one independent variable of a single observation. ICE plots are related to Partial Dependency plots, but they also let you find individual differences, subgroups of interest, and input interactions. Source.
In the first graph above, we see a PD plot that is essentially flat, indicating no relationship between the input X1 and the model predictions. When we look at the ICE plot on the right, however, we see two separate observations in the same data set, and we realize that the input X1 is strongly related to the target, but there are individual differences among observations.
To compute an ICE curve yourself, see Ray Wright’s 2018 SGF paper (also the source of the PD and ICE plot above).
ICE plots were originally developed to display one curve for each observation from the training data set, but you can instead use sampling or clustering to reduce the number of curves to see patterns more easily. The ICE plot below by Andrew Christian illustrates using ICE plots to identify subgroups of individuals.
LIME helps you to interpret an individual prediction/instance/point.
LIME explains the predictions of any classifier by fitting a linear regression to your original model inputs using prediction probability as the target.
Remember that the linear regression model here has PREDICTION PROBABILITY as its dependent variable, not your target, so it is not a representation of the variables that are important to your outcome. The LIME graphs represent the coefficients for the parameter estimates of the localized linear regression model. As a variable increases, it may have either a positive or negative effect on the PREDICTION for that cluster.
In SAS VDMML 8.3 (18w30), you could end up with a variable that’s predictive within the LIME model that is not represented in the actual model because the variables used in the linear regression model are selected using LASSO. In future releases, the linear model inputs will be limited to those inputs that are in the original model.
NOTE: Recall that LASSO (least absolute shrinkage and selection operator) is a shrinkage method in variable selection and regularization.
Still in VDMML 8.3 (18w25 release), LIME is calculated based on clusters (using k-prototypes clusters) rather than on individual observations. It chooses cluster centroids that serve as proxies for actual observations. Individual computation is on the roadmap.
In Model Studio (“Build Models”), you can select a model node, and you will see Model Interpretability in the right pane under the Node options, as shown below.
You can then expand Model Interpretability as shown below.
Example results are shown below.
Ilknur Kaynar-Kabul & Mustafa Kabul AI Webinar: Implementing AI Systems With Interpretability, Transparency and Trust (start at 34 minute mark)
Understanding Black-Box Models with Partial Dependence and Individual Conditional Expectation Plots
Since I wrote this article a couple of months ago, there have been a number of comments and tips regarding LIME and ICE on the Visual DMML listserv. A few of those are excerpted below:
Ilknur Kaynar Kabul leads the SAS R&D team that creates tools for interpreting complex machine learning models. Much of the information in this post was extracted from resources she created.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.