BookmarkSubscribeRSS Feed
PadmaroopaK
Fluorite | Level 6

I have a simple gradient boosting model (maximum branch = 2 , maximum depth = 1 {Adaboost} ) in e-miner(v 14.1) with binary target and mostly interval inputs(~500 variables). I will be choosing variables if the variable importance > 0.05 for both training and validation datasets. However, I am trying to understand the mathematics behind how the "variable importance" is calculated. I read the documentation (decision tree variable importance ) but its very vague. I was wondering if anyone could shed light on how it is calculated with a simple example? It will be very helpful. 

6 REPLIES 6
pink_poodle
Barite | Level 11

feature importance for a single decision tree - the amount that each attribute split point improves the performance measure, weighted by the number of observations the node is responsible for. The performance measure may be the purity (Gini index) used to select the split points, or another more specific error function.
overall feature importance - feature importances averaged across all of the the decision trees within the model.

gcjfernandez
SAS Employee
The Gradient Boosting node in SAS EM provides two approaches to evaluating the importance of a variable: split-based and observation-based.
The split-based approach uses the reduction in the sum of squares from splitting a node, summing over all nodes.
The observation-based approach uses the increase in a fit statistic due to seeing values of a variable uninformative.
Measures of variable importance generally underestimate the importance of correlated variables.

Two correlated variables could make a similar contribution to a model. The total contribution is usually divided between them, and neither variable acquires the rank it deserves.

Eliminating either variable generally increases the contribution attributed to the other.
PadmaroopaK
Fluorite | Level 6
Thank you for your response.

I am looking at the split-based approach in my model. I find that reduction in sum of squares from the splitting node explanation a little abstract. Is there any SAS white paper or any way to see that actual calculation for atleast one variable? I am interested in seeing that back end calculation that produces those numbers.

Thanks!
gcjfernandez
SAS Employee

I  am attaching the screenshot from SAS Enterprise miner Reference documentation 14.3 where you can find the official computation description.

PadmaroopaK
Fluorite | Level 6
Is there a way for me to see this back end computation in e-miner?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2869 views
  • 2 likes
  • 3 in conversation