<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: HPforest variable importance in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/308644#M4638</link>
    <description>1. Yes, VARS_TO_TRY=n means SAS HPFOREST (and actually any package that wants to legitimately call itself RF should) will randomly pick n out of the total # of variables input by the user to do splitting. Yes, the same n figure applies on all branch split. The rationale is: the split criteria tend to become more and more 'ad hoc' when larger and large number of input variables are put to test for splitting, regardless how one adjusts (Kass or else). So one thing revolutionary about RF is not to run 'split significance test' on all fed input variables. Just pick a smaller number. Rule of thumb is SQRT of the total. After the n smaller # of variables are randomly picked, then split test is ran against them. One direction that  is taking place is to make split criteria more simulative

2. Negative reduction Gini intuitively means you should drop the variable since it is not significant enough contributor. It is common practice that one runs RF for once, drop those with negative reduction G and re-run RF. So to use RF both to select variable and build model. 

3. If you really believe there is such thing like science in data science or statistics, or there should be, then there is nothing one should and can generalize against one method or another. Since when declaring one method universally better than another becomes the mission of data science or any science at all? My suggestion to you, my friend, is to focus on the work on your hand, focus on delivering value to those who hire you and need you to work. Let fashion be fashion. No matter how the central tendency is going, in one way or another, study the data first. Spend most of your time study data, not method. Thank you for using SAS. Best Regards. Jason Xin</description>
    <pubDate>Wed, 02 Nov 2016 01:42:16 GMT</pubDate>
    <dc:creator>JasonXin</dc:creator>
    <dc:date>2016-11-02T01:42:16Z</dc:date>
    <item>
      <title>HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235648#M3363</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I’m looking for an explanation of &amp;nbsp;how the following HPForest variable importance metrics are calculated:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE width="169"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="169"&gt;
&lt;P&gt;Train: Gini Reduction&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="169"&gt;
&lt;P&gt;Train: Margin Reduction&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="169"&gt;
&lt;P&gt;OOB: Gini Reduction&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="169"&gt;
&lt;P&gt;OOB: Margin Reduction&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a HPForest user manual that can be shared ?&amp;nbsp;there is nothing on this in the EM help.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Many thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 12:39:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235648#M3363</guid>
      <dc:creator>ShaneMc</dc:creator>
      <dc:date>2015-11-20T12:39:39Z</dc:date>
    </item>
    <item>
      <title>Re: HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235660#M3364</link>
      <description>&lt;P&gt;The doc for HPFOREST is in the document &lt;EM&gt;SAS Enterprise Miner 14.1: High-Performance Procedures. &lt;/EM&gt;The section titled "Measuring Variable Importance" discusses Gini reductoin, margin importance, and other methods.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;You can &lt;A href="http://support.sas.com/documentation/onlinedoc/miner/" target="_self"&gt;see the EM doc from support.sas.com&lt;/A&gt;&amp;nbsp;. The web page says that that doc is a "secure document" that is "provided in the product and on a secure site" and it gives a link for how to access the secure site.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 13:57:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235660#M3364</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2015-11-20T13:57:52Z</dc:date>
    </item>
    <item>
      <title>Re: HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235675#M3366</link>
      <description>ShaneMc ,&lt;BR /&gt;&lt;BR /&gt;If you have the product,  you can also access from the product's Help menu. Best Regards. Jason Xin</description>
      <pubDate>Fri, 20 Nov 2015 14:36:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235675#M3366</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2015-11-20T14:36:11Z</dc:date>
    </item>
    <item>
      <title>Re: HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235677#M3367</link>
      <description>&lt;P&gt;Thanks Jason, using EM 13.2 - this level of detail is not avilable in help menu. The HP Procedures documenet is super though.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 14:42:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235677#M3367</guid>
      <dc:creator>ShaneMc</dc:creator>
      <dc:date>2015-11-20T14:42:23Z</dc:date>
    </item>
    <item>
      <title>Re: HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235681#M3368</link>
      <description>I also wrote a blog, almost 3 years ago. Here is the link&lt;BR /&gt;&lt;A href="http://analytics-in-writing.blogspot.com/search?updated-min=2012-01-01T00:00:00-08:00&amp;amp;updated-max=2013-01-01T00:00:00-08:00&amp;amp;max-results=7" target="_blank"&gt;http://analytics-in-writing.blogspot.com/search?updated-min=2012-01-01T00:00:00-08:00&amp;amp;updated-max=2013-01-01T00:00:00-08:00&amp;amp;max-results=7&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Best Regards&lt;BR /&gt;Jason Xin</description>
      <pubDate>Fri, 20 Nov 2015 14:51:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/235681#M3368</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2015-11-20T14:51:16Z</dc:date>
    </item>
    <item>
      <title>Re: HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/308629#M4637</link>
      <description>&lt;P&gt;Hi Jason, I just started to use HPforest and quickly went though the SAS documentation. There are still a few questions&amp;nbsp;in my mind:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;a) Does&amp;nbsp;VARS_TO_TRY=n mean that SAS randomly select n variables from all the N variables each time to split? And these n varables are not the same among different splits?&lt;/P&gt;&lt;P&gt;b) What does a negative&amp;nbsp;Loss Reduction Gini number mean? Do we have some measure to tell us that some of the variables are not important for the model, like p-value in a Logistic Regression?&lt;/P&gt;&lt;P&gt;c) Any&amp;nbsp;consensus regarding the better&amp;nbsp;prediction approach between RF and Logistic Regression?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hongguang&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2016 21:32:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/308629#M4637</guid>
      <dc:creator>hsun</dc:creator>
      <dc:date>2016-11-01T21:32:17Z</dc:date>
    </item>
    <item>
      <title>Re: HPforest variable importance</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/308644#M4638</link>
      <description>1. Yes, VARS_TO_TRY=n means SAS HPFOREST (and actually any package that wants to legitimately call itself RF should) will randomly pick n out of the total # of variables input by the user to do splitting. Yes, the same n figure applies on all branch split. The rationale is: the split criteria tend to become more and more 'ad hoc' when larger and large number of input variables are put to test for splitting, regardless how one adjusts (Kass or else). So one thing revolutionary about RF is not to run 'split significance test' on all fed input variables. Just pick a smaller number. Rule of thumb is SQRT of the total. After the n smaller # of variables are randomly picked, then split test is ran against them. One direction that  is taking place is to make split criteria more simulative

2. Negative reduction Gini intuitively means you should drop the variable since it is not significant enough contributor. It is common practice that one runs RF for once, drop those with negative reduction G and re-run RF. So to use RF both to select variable and build model. 

3. If you really believe there is such thing like science in data science or statistics, or there should be, then there is nothing one should and can generalize against one method or another. Since when declaring one method universally better than another becomes the mission of data science or any science at all? My suggestion to you, my friend, is to focus on the work on your hand, focus on delivering value to those who hire you and need you to work. Let fashion be fashion. No matter how the central tendency is going, in one way or another, study the data first. Spend most of your time study data, not method. Thank you for using SAS. Best Regards. Jason Xin</description>
      <pubDate>Wed, 02 Nov 2016 01:42:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/HPforest-variable-importance/m-p/308644#M4638</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-11-02T01:42:16Z</dc:date>
    </item>
  </channel>
</rss>

