<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Variable Ranking in Random Forest in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/302472#M4490</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a dataset of 957 predictors and a binary target. Many&amp;nbsp;predictors are highly correlated so I want to run&amp;nbsp;a random forest model to select a subset of predictors for further modeling. I set&amp;nbsp;the 'Variable Selection' option of the&amp;nbsp;HP Forest node to 'Yes' and then attached another HP Forest node to it so I can use only the selected variables for modeling.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/5150i45FA2FB19F071233/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="Capture.PNG" title="Capture.PNG" /&gt;&lt;/P&gt;&lt;P&gt;However, I am not sure how the variables are selected. First, when I look at the 'Variable Importance' tab of the 1st HP Forest node, the variables can be ranked by their 'Number of Splitting Rules' and&amp;nbsp;'Gini Reduction'. As I understand, 0 on both statistics mean that the variable is not important. Does it also mean Random Forest doesn't use those variables at all (as no splitting rules are made based on those variables)? &amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, when I check the 'Variable Importance' tab of the 2nd HP Forest node, I see there are only 586 variables, but many variables are those that have '0'&amp;nbsp;importance (Based on Num of Splitting Rules and Gini Reduction) based on the 'Variable Importance' ranking in the 1st HP Forest node. Can someone explain to me how does HP Forest do variable selection?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 04 Oct 2016 21:13:44 GMT</pubDate>
    <dc:creator>YuanNiu</dc:creator>
    <dc:date>2016-10-04T21:13:44Z</dc:date>
    <item>
      <title>Variable Ranking in Random Forest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/302472#M4490</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a dataset of 957 predictors and a binary target. Many&amp;nbsp;predictors are highly correlated so I want to run&amp;nbsp;a random forest model to select a subset of predictors for further modeling. I set&amp;nbsp;the 'Variable Selection' option of the&amp;nbsp;HP Forest node to 'Yes' and then attached another HP Forest node to it so I can use only the selected variables for modeling.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/5150i45FA2FB19F071233/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="Capture.PNG" title="Capture.PNG" /&gt;&lt;/P&gt;&lt;P&gt;However, I am not sure how the variables are selected. First, when I look at the 'Variable Importance' tab of the 1st HP Forest node, the variables can be ranked by their 'Number of Splitting Rules' and&amp;nbsp;'Gini Reduction'. As I understand, 0 on both statistics mean that the variable is not important. Does it also mean Random Forest doesn't use those variables at all (as no splitting rules are made based on those variables)? &amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, when I check the 'Variable Importance' tab of the 2nd HP Forest node, I see there are only 586 variables, but many variables are those that have '0'&amp;nbsp;importance (Based on Num of Splitting Rules and Gini Reduction) based on the 'Variable Importance' ranking in the 1st HP Forest node. Can someone explain to me how does HP Forest do variable selection?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2016 21:13:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/302472#M4490</guid>
      <dc:creator>YuanNiu</dc:creator>
      <dc:date>2016-10-04T21:13:44Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Ranking in Random Forest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/303159#M4504</link>
      <description>&lt;P&gt;This is documented in detail in the &lt;STRONG&gt;SAS Enterprise MIner 14.1 High-Performance Procedures&lt;/STRONG&gt; documentation available from the Secure documentation link at: &lt;A href="http://go.documentation.sas.com/?docsetId=emhpprcref&amp;amp;docsetTarget=emhpprcref_hpforest_details28.htm&amp;amp;docsetVersion=14.2&amp;amp;locale=en" target="_self"&gt;http://go.documentation.sas.com/?docsetId=emhpprcref&amp;amp;docsetTarget=emhpprcref_hpforest_details28.htm&amp;amp;docsetVersion=14.2&amp;amp;locale=en&amp;nbsp;&lt;/A&gt;&amp;nbsp;(see the information in the paragraph above the link for information on obtaining access to this secure site). &amp;nbsp;There is a whole section under &lt;STRONG&gt;Details&lt;/STRONG&gt; in the HPFOREST chapter on measuring variable importance. &amp;nbsp;Hope that helps!&lt;/P&gt;
&lt;P&gt;Wendy&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Sep 2017 11:14:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/303159#M4504</guid>
      <dc:creator>WendyCzika</dc:creator>
      <dc:date>2017-09-29T11:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Ranking in Random Forest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/397701#M6052</link>
      <description>&lt;P&gt;Dear Wendy,&lt;BR /&gt;&lt;BR /&gt;I can't access your provided&amp;nbsp;link &amp;nbsp;&lt;A href="http://supportprod.unx.sas.com/documentation/onlinedoc/miner/index.html" target="_self" rel="nofollow noopener noreferrer"&gt;http://supportprod.unx.sas.com/documentation/onlinedoc/miner/index.html&lt;/A&gt;&amp;nbsp;, is there any other link that can explain the best practices on using Random Forest for variable selection? Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Nelson&lt;/P&gt;</description>
      <pubDate>Thu, 21 Sep 2017 07:29:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/397701#M6052</guid>
      <dc:creator>nelson_lee</dc:creator>
      <dc:date>2017-09-21T07:29:31Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Ranking in Random Forest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/397969#M6054</link>
      <description>Try this new link:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://documentation.sas.com/?docsetId=emhpprcref&amp;amp;docsetTarget=emhpprcref_hpforest_details28.htm&amp;amp;docsetVersion=14.2&amp;amp;locale=en" target="_blank"&gt;http://documentation.sas.com/?docsetId=emhpprcref&amp;amp;docsetTarget=emhpprcref_hpforest_details28.htm&amp;amp;docsetVersion=14.2&amp;amp;locale=en&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 22 Sep 2017 01:07:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-Ranking-in-Random-Forest/m-p/397969#M6054</guid>
      <dc:creator>WendyCzika</dc:creator>
      <dc:date>2017-09-22T01:07:17Z</dc:date>
    </item>
  </channel>
</rss>

