Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

HPforest variable importance

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

HPforest variable importance

Hi all,

 

I’m looking for an explanation of  how the following HPForest variable importance metrics are calculated:

 

Train: Gini Reduction

Train: Margin Reduction

OOB: Gini Reduction

OOB: Margin Reduction

 

Is there a HPForest user manual that can be shared ? there is nothing on this in the EM help.

 

Many thanks.


Accepted Solutions
Solution
‎11-20-2015 09:38 AM
SAS Super FREQ
Posts: 3,475

Re: HPforest variable importance

The doc for HPFOREST is in the document SAS Enterprise Miner 14.1: High-Performance Procedures. The section titled "Measuring Variable Importance" discusses Gini reductoin, margin importance, and other methods.

 

You can see the EM doc from support.sas.com . The web page says that that doc is a "secure document" that is "provided in the product and on a secure site" and it gives a link for how to access the secure site.

View solution in original post


All Replies
Solution
‎11-20-2015 09:38 AM
SAS Super FREQ
Posts: 3,475

Re: HPforest variable importance

The doc for HPFOREST is in the document SAS Enterprise Miner 14.1: High-Performance Procedures. The section titled "Measuring Variable Importance" discusses Gini reductoin, margin importance, and other methods.

 

You can see the EM doc from support.sas.com . The web page says that that doc is a "secure document" that is "provided in the product and on a secure site" and it gives a link for how to access the secure site.

SAS Employee
Posts: 122

Re: HPforest variable importance

ShaneMc ,

If you have the product, you can also access from the product's Help menu. Best Regards. Jason Xin
New Contributor
Posts: 4

Re: HPforest variable importance

Thanks Jason, using EM 13.2 - this level of detail is not avilable in help menu. The HP Procedures documenet is super though.

SAS Employee
Posts: 122

Re: HPforest variable importance

I also wrote a blog, almost 3 years ago. Here is the link
http://analytics-in-writing.blogspot.com/search?updated-min=2012-01-01T00:00:00-08:00&updated-max=20...

Best Regards
Jason Xin
Frequent Learner
Posts: 1

Re: HPforest variable importance

Hi Jason, I just started to use HPforest and quickly went though the SAS documentation. There are still a few questions in my mind:

 

a) Does VARS_TO_TRY=n mean that SAS randomly select n variables from all the N variables each time to split? And these n varables are not the same among different splits?

b) What does a negative Loss Reduction Gini number mean? Do we have some measure to tell us that some of the variables are not important for the model, like p-value in a Logistic Regression?

c) Any consensus regarding the better prediction approach between RF and Logistic Regression?

 

Many thanks!

 

Hongguang

SAS Employee
Posts: 122

Re: HPforest variable importance

1. Yes, VARS_TO_TRY=n means SAS HPFOREST (and actually any package that wants to legitimately call itself RF should) will randomly pick n out of the total # of variables input by the user to do splitting. Yes, the same n figure applies on all branch split. The rationale is: the split criteria tend to become more and more 'ad hoc' when larger and large number of input variables are put to test for splitting, regardless how one adjusts (Kass or else). So one thing revolutionary about RF is not to run 'split significance test' on all fed input variables. Just pick a smaller number. Rule of thumb is SQRT of the total. After the n smaller # of variables are randomly picked, then split test is ran against them. One direction that is taking place is to make split criteria more simulative 2. Negative reduction Gini intuitively means you should drop the variable since it is not significant enough contributor. It is common practice that one runs RF for once, drop those with negative reduction G and re-run RF. So to use RF both to select variable and build model. 3. If you really believe there is such thing like science in data science or statistics, or there should be, then there is nothing one should and can generalize against one method or another. Since when declaring one method universally better than another becomes the mission of data science or any science at all? My suggestion to you, my friend, is to focus on the work on your hand, focus on delivering value to those who hire you and need you to work. Let fashion be fashion. No matter how the central tendency is going, in one way or another, study the data first. Spend most of your time study data, not method. Thank you for using SAS. Best Regards. Jason Xin
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1105 views
  • 1 like
  • 4 in conversation