<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using SAS to find the most important variables out of a large number of variables in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Using-SAS-to-find-the-most-important-variables-out-of-a-large/m-p/578129#M13301</link>
    <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset with over 500k observations and about 500 variables. About 20 variables are categorical, nominal, or datetime and the rest are numeric. I want to build a model using this dataset. I have a dependent variable that is binary (1 or 0) but there are too many dependent variables. I want to reduce&amp;nbsp;the number of dependent variables to&amp;nbsp;about 30. Someone suggested I use a random forest and an importance plot to find the 30 most importance variables. I have never used random forest before but I have a basic understanding of the theory behind it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;edit: Also someone suggested Chi-square for feature selection.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please show me an efficient way to find the 30 most important variables. I am using SAS enterprise 7.1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be great&lt;/P&gt;</description>
    <pubDate>Wed, 31 Jul 2019 17:24:51 GMT</pubDate>
    <dc:creator>de95</dc:creator>
    <dc:date>2019-07-31T17:24:51Z</dc:date>
    <item>
      <title>Using SAS to find the most important variables out of a large number of variables</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Using-SAS-to-find-the-most-important-variables-out-of-a-large/m-p/578129#M13301</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset with over 500k observations and about 500 variables. About 20 variables are categorical, nominal, or datetime and the rest are numeric. I want to build a model using this dataset. I have a dependent variable that is binary (1 or 0) but there are too many dependent variables. I want to reduce&amp;nbsp;the number of dependent variables to&amp;nbsp;about 30. Someone suggested I use a random forest and an importance plot to find the 30 most importance variables. I have never used random forest before but I have a basic understanding of the theory behind it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;edit: Also someone suggested Chi-square for feature selection.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please show me an efficient way to find the 30 most important variables. I am using SAS enterprise 7.1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be great&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 17:24:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Using-SAS-to-find-the-most-important-variables-out-of-a-large/m-p/578129#M13301</guid>
      <dc:creator>de95</dc:creator>
      <dc:date>2019-07-31T17:24:51Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS to find the most important variables out of a large number of variables</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Using-SAS-to-find-the-most-important-variables-out-of-a-large/m-p/578348#M13342</link>
      <description>&lt;P&gt;I would recommend to use PROC HPGENSELECT .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC PLS + missing=em&amp;nbsp; option (which could better handle missing value).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your variable have many missing value ,try PROC PLS .(HPGENSELECT would drop these missing obs)&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2019 11:48:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Using-SAS-to-find-the-most-important-variables-out-of-a-large/m-p/578348#M13342</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-08-01T11:48:17Z</dc:date>
    </item>
  </channel>
</rss>

