turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- ICE Plots (Individual Conditional Expectation) in ...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-23-2016 05:51 PM

Hi! Goldstein, Kapelner, Bleich, and Pitkin developed a nice tool for visualizing models estimated by any supervised learning algorithm, called ICE plots (see "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation" https://www.researchgate.net/publication/257028373_Peeking_Inside_the_Black_Box_Visualizing_Statisti...). Naturally there is an R package implementation (ICEbox). Does SAS have an implementation, or any plans for producing one? Thanks!

Accepted Solutions

Solution

01-12-2018
09:59 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to topkatz

06-27-2017 03:50 PM - edited 06-27-2017 03:52 PM

Sorry for the late response but I just saw this and happen to be reading the ICE article.

This link shows one way to do partial dependence plots in SAS. The example uses regression but the technique is model agnostic.

https://qizeresearch.wordpress.com/2013/12/12/partial-dependence-plot/

For ICE plots you simply skip the aggregation of the yHats, i.e., you plot each observation vector over the range of X values of the variable of interest. I like to overlay the PD function on the individual curves, as it gives you an idea of individual differences around the overall tendency, and it can help to discover interactions and interesting subgroups. There are examples of this in the ICE paper.

Just note that scalability may be in issue with big data (lots of rows or high cardinality inputs) and there may be visualization challenges (clutter) because ICE gives you a separate curve for each observation in your dataset. Thus, you might consider sampling the "other" rows or binning the values of the variable of interest (especially high cardinality interval model inputs). Other tricks can be applied as well to select the interesting curves rather than plotting them all.

Hope this helps.

Ray

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to topkatz

11-24-2016 12:26 AM

It is more like EFFECTPLOT statement in SAS. Check: http://blogs.sas.com/content/iml/2016/03/21/statistical-analysis-stephen-curry-shooting.html http://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html

Solution

01-12-2018
09:59 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to topkatz

06-27-2017 03:50 PM - edited 06-27-2017 03:52 PM

Sorry for the late response but I just saw this and happen to be reading the ICE article.

This link shows one way to do partial dependence plots in SAS. The example uses regression but the technique is model agnostic.

https://qizeresearch.wordpress.com/2013/12/12/partial-dependence-plot/

For ICE plots you simply skip the aggregation of the yHats, i.e., you plot each observation vector over the range of X values of the variable of interest. I like to overlay the PD function on the individual curves, as it gives you an idea of individual differences around the overall tendency, and it can help to discover interactions and interesting subgroups. There are examples of this in the ICE paper.

Just note that scalability may be in issue with big data (lots of rows or high cardinality inputs) and there may be visualization challenges (clutter) because ICE gives you a separate curve for each observation in your dataset. Thus, you might consider sampling the "other" rows or binning the values of the variable of interest (especially high cardinality interval model inputs). Other tricks can be applied as well to select the interesting curves rather than plotting them all.

Hope this helps.

Ray