BookmarkSubscribeRSS Feed
PadmaroopaK
Fluorite | Level 6

I have a simple gradient boosting model (maximum branch = 2 , maximum depth = 1 {Adaboost} ) in e-miner(v 14.1) with binary target and mostly interval inputs(~500 variables). I will be choosing variables if the variable importance > 0.05 for both training and validation datasets. However, I am trying to understand the mathematics behind how the "variable importance" is calculated. I read the documentation (decision tree variable importance ) but its very vague. I was wondering if anyone could shed light on how it is calculated with a simple example? It will be very helpful. 

6 REPLIES 6
pink_poodle
Barite | Level 11

feature importance for a single decision tree - the amount that each attribute split point improves the performance measure, weighted by the number of observations the node is responsible for. The performance measure may be the purity (Gini index) used to select the split points, or another more specific error function.
overall feature importance - feature importances averaged across all of the the decision trees within the model.

gcjfernandez
SAS Employee
The Gradient Boosting node in SAS EM provides two approaches to evaluating the importance of a variable: split-based and observation-based.
The split-based approach uses the reduction in the sum of squares from splitting a node, summing over all nodes.
The observation-based approach uses the increase in a fit statistic due to seeing values of a variable uninformative.
Measures of variable importance generally underestimate the importance of correlated variables.

Two correlated variables could make a similar contribution to a model. The total contribution is usually divided between them, and neither variable acquires the rank it deserves.

Eliminating either variable generally increases the contribution attributed to the other.
PadmaroopaK
Fluorite | Level 6
Thank you for your response.

I am looking at the split-based approach in my model. I find that reduction in sum of squares from the splitting node explanation a little abstract. Is there any SAS white paper or any way to see that actual calculation for atleast one variable? I am interested in seeing that back end calculation that produces those numbers.

Thanks!
gcjfernandez
SAS Employee

I  am attaching the screenshot from SAS Enterprise miner Reference documentation 14.3 where you can find the official computation description.

PadmaroopaK
Fluorite | Level 6
Is there a way for me to see this back end computation in e-miner?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2386 views
  • 2 likes
  • 3 in conversation