Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Variable Importance and Variable Worth in Clustering

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Variable Importance and Variable Worth in Clustering

One of the result of the clustering node is Variable Importance and Variable Worth that can be seen at Segment Profile Node.

 

I have several questions regarding those results:

1. What is the Variable Importance?

2. How do we interprete the Variable Importance? Such as if the input variable has quite high importance, does it mean that the variable is good?

3. How does Enterprise Miner create the variable importance?

4. What are the difference between Variable Importance and Variable Worth?

 

Thank you


Accepted Solutions
Solution
‎07-07-2017 01:01 PM
SAS Employee
Posts: 121

Re: Variable Importance and Variable Worth in Clustering

  1. What is the Variable Importance?

 

Response -  Variable Importance is calculated using the SAS decision tree methodology.  It attempts to evaluate the overall value or importance of the variable over the fitted tree.  The variable used to split the root node impacts every observation while those which split nodes lower in the tree impact a smaller number of observations.  Variable Importance is impacted by both the number of observations impacted and the purity of the resulting split.

 

  1. How do we interprete the Variable Importance? Such as if the input variable has quite high importance, does it mean that the variable is good?

 

Response -  Variable Importance can be used to compare different variables used in a specific tree.  Pruning or growing the tree would yield a different model and might impact those values.  Variable importance can be interpreted to rank the variables in order based on their overall impact on the model, but you cannot draw meaning from the overall value itself so it is not meaningful to compare Variable Importance from different models.  Neither can you assess relative importance so a variable with an importance that is twice the size of another variable in a model is not necessarily twice as important.  If variable A has a Variable Importance higher than variable B, then variable A can be said to have a larger impact on the model.  It does not matter how much bigger variable A is numerically in B.   You can inspect the fitted tree yourself to assess the relative impact of each variable but the actual value of Variable Importance is only meant for ranking their importance.

 

  1. How does Enterprise Miner create the variable importance?

 

Response - There is a section called "Variable Importance" in the Decision Tree Node Chapter in the SAS Enterprise Miner Reference Guide. This section gives the formula for the Variable Importance.

 

 

  1. What are the difference between Variable Importance and Variable Worth?

 

Response – The Segment Profile node can use two methods to determine which variables can be used to differentiate among segments.   One method is based on binning the input variable to identify their maximum logworth which is known as Variable Worth.   The second builds a Decision Tree to predict the segments from the inputs and uses the Tree methodology of assessing Variable Importance.   When the depth of the decision tree used to differentiate among variables is 1, the logworth (or Variable Worth) is used to rank the variables.  Otherwise, an importance measure (or Variable Importance) is used the rank the variables.   Allowing the decision tree to have greater depth allows for interaction among the variables.   

View solution in original post


All Replies
Super User
Posts: 17,819

Re: Variable Importance and Variable Worth in Clustering

Here's a video that talks about the output from Segment Profiling. 

Hope it helps.

 

http://support.sas.com/training/video/tip20.html

Solution
‎07-07-2017 01:01 PM
SAS Employee
Posts: 121

Re: Variable Importance and Variable Worth in Clustering

  1. What is the Variable Importance?

 

Response -  Variable Importance is calculated using the SAS decision tree methodology.  It attempts to evaluate the overall value or importance of the variable over the fitted tree.  The variable used to split the root node impacts every observation while those which split nodes lower in the tree impact a smaller number of observations.  Variable Importance is impacted by both the number of observations impacted and the purity of the resulting split.

 

  1. How do we interprete the Variable Importance? Such as if the input variable has quite high importance, does it mean that the variable is good?

 

Response -  Variable Importance can be used to compare different variables used in a specific tree.  Pruning or growing the tree would yield a different model and might impact those values.  Variable importance can be interpreted to rank the variables in order based on their overall impact on the model, but you cannot draw meaning from the overall value itself so it is not meaningful to compare Variable Importance from different models.  Neither can you assess relative importance so a variable with an importance that is twice the size of another variable in a model is not necessarily twice as important.  If variable A has a Variable Importance higher than variable B, then variable A can be said to have a larger impact on the model.  It does not matter how much bigger variable A is numerically in B.   You can inspect the fitted tree yourself to assess the relative impact of each variable but the actual value of Variable Importance is only meant for ranking their importance.

 

  1. How does Enterprise Miner create the variable importance?

 

Response - There is a section called "Variable Importance" in the Decision Tree Node Chapter in the SAS Enterprise Miner Reference Guide. This section gives the formula for the Variable Importance.

 

 

  1. What are the difference between Variable Importance and Variable Worth?

 

Response – The Segment Profile node can use two methods to determine which variables can be used to differentiate among segments.   One method is based on binning the input variable to identify their maximum logworth which is known as Variable Worth.   The second builds a Decision Tree to predict the segments from the inputs and uses the Tree methodology of assessing Variable Importance.   When the depth of the decision tree used to differentiate among variables is 1, the logworth (or Variable Worth) is used to rank the variables.  Otherwise, an importance measure (or Variable Importance) is used the rank the variables.   Allowing the decision tree to have greater depth allows for interaction among the variables.   

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 1233 views
  • 1 like
  • 3 in conversation