BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
octavianidevi
Calcite | Level 5

One of the result of the clustering node is Variable Importance and Variable Worth that can be seen at Segment Profile Node.

 

I have several questions regarding those results:

1. What is the Variable Importance?

2. How do we interprete the Variable Importance? Such as if the input variable has quite high importance, does it mean that the variable is good?

3. How does Enterprise Miner create the variable importance?

4. What are the difference between Variable Importance and Variable Worth?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee
  1. What is the Variable Importance?

 

Response -  Variable Importance is calculated using the SAS decision tree methodology.  It attempts to evaluate the overall value or importance of the variable over the fitted tree.  The variable used to split the root node impacts every observation while those which split nodes lower in the tree impact a smaller number of observations.  Variable Importance is impacted by both the number of observations impacted and the purity of the resulting split.

 

  1. How do we interprete the Variable Importance? Such as if the input variable has quite high importance, does it mean that the variable is good?

 

Response -  Variable Importance can be used to compare different variables used in a specific tree.  Pruning or growing the tree would yield a different model and might impact those values.  Variable importance can be interpreted to rank the variables in order based on their overall impact on the model, but you cannot draw meaning from the overall value itself so it is not meaningful to compare Variable Importance from different models.  Neither can you assess relative importance so a variable with an importance that is twice the size of another variable in a model is not necessarily twice as important.  If variable A has a Variable Importance higher than variable B, then variable A can be said to have a larger impact on the model.  It does not matter how much bigger variable A is numerically in B.   You can inspect the fitted tree yourself to assess the relative impact of each variable but the actual value of Variable Importance is only meant for ranking their importance.

 

  1. How does Enterprise Miner create the variable importance?

 

Response - There is a section called "Variable Importance" in the Decision Tree Node Chapter in the SAS Enterprise Miner Reference Guide. This section gives the formula for the Variable Importance.

 

 

  1. What are the difference between Variable Importance and Variable Worth?

 

Response – The Segment Profile node can use two methods to determine which variables can be used to differentiate among segments.   One method is based on binning the input variable to identify their maximum logworth which is known as Variable Worth.   The second builds a Decision Tree to predict the segments from the inputs and uses the Tree methodology of assessing Variable Importance.   When the depth of the decision tree used to differentiate among variables is 1, the logworth (or Variable Worth) is used to rank the variables.  Otherwise, an importance measure (or Variable Importance) is used the rank the variables.   Allowing the decision tree to have greater depth allows for interaction among the variables.   

View solution in original post

4 REPLIES 4
Reeza
Super User

Here's a video that talks about the output from Segment Profiling. 

Hope it helps.

 

http://support.sas.com/training/video/tip20.html

DougWielenga
SAS Employee
  1. What is the Variable Importance?

 

Response -  Variable Importance is calculated using the SAS decision tree methodology.  It attempts to evaluate the overall value or importance of the variable over the fitted tree.  The variable used to split the root node impacts every observation while those which split nodes lower in the tree impact a smaller number of observations.  Variable Importance is impacted by both the number of observations impacted and the purity of the resulting split.

 

  1. How do we interprete the Variable Importance? Such as if the input variable has quite high importance, does it mean that the variable is good?

 

Response -  Variable Importance can be used to compare different variables used in a specific tree.  Pruning or growing the tree would yield a different model and might impact those values.  Variable importance can be interpreted to rank the variables in order based on their overall impact on the model, but you cannot draw meaning from the overall value itself so it is not meaningful to compare Variable Importance from different models.  Neither can you assess relative importance so a variable with an importance that is twice the size of another variable in a model is not necessarily twice as important.  If variable A has a Variable Importance higher than variable B, then variable A can be said to have a larger impact on the model.  It does not matter how much bigger variable A is numerically in B.   You can inspect the fitted tree yourself to assess the relative impact of each variable but the actual value of Variable Importance is only meant for ranking their importance.

 

  1. How does Enterprise Miner create the variable importance?

 

Response - There is a section called "Variable Importance" in the Decision Tree Node Chapter in the SAS Enterprise Miner Reference Guide. This section gives the formula for the Variable Importance.

 

 

  1. What are the difference between Variable Importance and Variable Worth?

 

Response – The Segment Profile node can use two methods to determine which variables can be used to differentiate among segments.   One method is based on binning the input variable to identify their maximum logworth which is known as Variable Worth.   The second builds a Decision Tree to predict the segments from the inputs and uses the Tree methodology of assessing Variable Importance.   When the depth of the decision tree used to differentiate among variables is 1, the logworth (or Variable Worth) is used to rank the variables.  Otherwise, an importance measure (or Variable Importance) is used the rank the variables.   Allowing the decision tree to have greater depth allows for interaction among the variables.   

shilpaISBCBA
Fluorite | Level 6

Decision tree split search is also based on maximum logworth isnt it. In that sense, why do I see different order in Variable Worth (in Segment profile node) vs Variable importance (in decision tree node)?

Could you please explain this difference

DougWielenga
SAS Employee

Typically you are determining Variable Importance in a Tree based on the relationship to a Target variable (Role=Target), but the Variable Importance in the Segment Profile node is based on the relationship to the Segment variable (Role=Segment).  If you specify a Target variable in the Decision Tree node and follow the Decision Tree node by a Segment Profile node, the Decision Tree will report Variable Importance based on the relationship to the Target variable while the Segment Profile node ignores the Target variable role and reports Variable Importance based on the relationship to the Segment role.  In the example above, _NODE_ (which stores the terminal node of the Decision Tree for each observation) is assigned the Segment role.  If the Decision Tree has  k-terminal nodes, the Segment variable _NODE_ will have k levels.  The Variable Importance in the Segment Profile node is then based on a fitting a decision tree model to the k-level variable _NODE_  rather than the Target variable itself.    

 

Hope this helps!

Doug

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 14419 views
  • 2 likes
  • 4 in conversation