BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ShaneMc
Fluorite | Level 6

Hi all,

 

I’m looking for an explanation of  how the following HPForest variable importance metrics are calculated:

 

Train: Gini Reduction

Train: Margin Reduction

OOB: Gini Reduction

OOB: Margin Reduction

 

Is there a HPForest user manual that can be shared ? there is nothing on this in the EM help.

 

Many thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The doc for HPFOREST is in the document SAS Enterprise Miner 14.1: High-Performance Procedures. The section titled "Measuring Variable Importance" discusses Gini reductoin, margin importance, and other methods.

 

You can see the EM doc from support.sas.com . The web page says that that doc is a "secure document" that is "provided in the product and on a secure site" and it gives a link for how to access the secure site.

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

The doc for HPFOREST is in the document SAS Enterprise Miner 14.1: High-Performance Procedures. The section titled "Measuring Variable Importance" discusses Gini reductoin, margin importance, and other methods.

 

You can see the EM doc from support.sas.com . The web page says that that doc is a "secure document" that is "provided in the product and on a secure site" and it gives a link for how to access the secure site.

JasonXin
SAS Employee
ShaneMc ,

If you have the product, you can also access from the product's Help menu. Best Regards. Jason Xin
ShaneMc
Fluorite | Level 6

Thanks Jason, using EM 13.2 - this level of detail is not avilable in help menu. The HP Procedures documenet is super though.

JasonXin
SAS Employee
I also wrote a blog, almost 3 years ago. Here is the link
http://analytics-in-writing.blogspot.com/search?updated-min=2012-01-01T00:00:00-08:00&updated-max=20...

Best Regards
Jason Xin
hsun
Calcite | Level 5

Hi Jason, I just started to use HPforest and quickly went though the SAS documentation. There are still a few questions in my mind:

 

a) Does VARS_TO_TRY=n mean that SAS randomly select n variables from all the N variables each time to split? And these n varables are not the same among different splits?

b) What does a negative Loss Reduction Gini number mean? Do we have some measure to tell us that some of the variables are not important for the model, like p-value in a Logistic Regression?

c) Any consensus regarding the better prediction approach between RF and Logistic Regression?

 

Many thanks!

 

Hongguang

JasonXin
SAS Employee
1. Yes, VARS_TO_TRY=n means SAS HPFOREST (and actually any package that wants to legitimately call itself RF should) will randomly pick n out of the total # of variables input by the user to do splitting. Yes, the same n figure applies on all branch split. The rationale is: the split criteria tend to become more and more 'ad hoc' when larger and large number of input variables are put to test for splitting, regardless how one adjusts (Kass or else). So one thing revolutionary about RF is not to run 'split significance test' on all fed input variables. Just pick a smaller number. Rule of thumb is SQRT of the total. After the n smaller # of variables are randomly picked, then split test is ran against them. One direction that is taking place is to make split criteria more simulative 2. Negative reduction Gini intuitively means you should drop the variable since it is not significant enough contributor. It is common practice that one runs RF for once, drop those with negative reduction G and re-run RF. So to use RF both to select variable and build model. 3. If you really believe there is such thing like science in data science or statistics, or there should be, then there is nothing one should and can generalize against one method or another. Since when declaring one method universally better than another becomes the mission of data science or any science at all? My suggestion to you, my friend, is to focus on the work on your hand, focus on delivering value to those who hire you and need you to work. Let fashion be fashion. No matter how the central tendency is going, in one way or another, study the data first. Spend most of your time study data, not method. Thank you for using SAS. Best Regards. Jason Xin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5372 views
  • 1 like
  • 4 in conversation