12-01-2015 01:26 PM
12-01-2015 05:55 PM
Many thanks for that, however I could not find my answer in that link. I'm wondering I can extract partial dependence plot in R easily but in SAS ...
It is so frustrating for me that I'm using SAS EM to develop my models in my PhD thesis and now I have to come back to R.
12-01-2015 06:27 PM
I looked into the partial dependence plot (2D and 3D versions) for gradient boosting and random forest about a year ago. I was not particularly impressed. It seemed useful when you have 2 or 3 variables, but I wasn't sure where that leads you when you have 4+ variables.
Since all partial dependence takes into account is "marginal effect of a variable on the class probability (classification) or response (regression)", I would much rather look at the variable importance coming out of the gradient boosting node.
If you have more insights about these plots, I will be happy to bring this up in our next development meeting. I am specially interested if these plots are something you would use in a real data set with 4 or more variables.
12-03-2015 11:38 PM
Thanks for your reply. But Partial dependence plot can be used when you have more than 3 variables as well. Partial dependency assists in identifying interaction between different variable in model and have a better interpretation. For example in my study (traffic crash study) using importance variable shows that population density is a significant factor, however how I can find in flouncing of this variable on model. I mean, it is not clear increasing population density increased traffic crashes or decreased it. I know it is possible to find it in SAS model code but it is difficult and time consuming. (for instance
12-04-2015 01:18 PM
Thanks for the details, I will check out that paper and figure out if you can use a workaround to calculate them when you use Start/End group nodes.
Some input from one of my most tree-versed coworkers:
Stay tuned and let's see if myself or someone from the community can come up with something.