<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Select Most Important variables before a Linear Regression, Please Help Thank You in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138278#M1297</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you want pick up variables , Check proc glmselect .&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 27 Jan 2015 10:16:22 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2015-01-27T10:16:22Z</dc:date>
    <item>
      <title>Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138271#M1290</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Hi All,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;I would like to build a linear regression model and I need to select the most important variables (highly correlated to my target)..Does anyone know a great technique (Not Decision trees), I am using Base SAS for data preparation and I have around 1000 variables for a start.So I want to reduce the number of variables and select the most important before I enter them into Proc Reg.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Your help would be much appreciated.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Many Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 15:01:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138271#M1290</guid>
      <dc:creator>Kanyange</dc:creator>
      <dc:date>2015-01-26T15:01:39Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138272#M1291</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You could use stepwise regression (I wonder what the stats experts come up with). For example,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Data R_Input (Drop=i j);&lt;BR /&gt;&amp;nbsp; Array X{*} X1-X1000;&lt;BR /&gt;&amp;nbsp; Do j=1 To 120;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; X1=Ranuni(1);&lt;BR /&gt; Y=X1*3+2+Ranuni(1)-0.5; * if SAS finds X1, it works :-);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Do i=2 To 1000;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; X{i}=Ranuni(1); &lt;BR /&gt; End;&lt;BR /&gt; Output;&lt;BR /&gt;&amp;nbsp; End;&lt;BR /&gt;Run;&lt;/P&gt;&lt;P&gt;Proc Reg Data=R_Input;&lt;BR /&gt;&amp;nbsp; Model Y = X1--X1000 / Selection=Stepwise SlEntry=0.1 SLStay=0.15;&lt;BR /&gt;Run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 15:53:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138272#M1291</guid>
      <dc:creator>user24feb</dc:creator>
      <dc:date>2015-01-26T15:53:16Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138273#M1292</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank You, but I didn't want to use Proc Reg at this stage, as to process 1000 var will take a long time...is there any other quicker way?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 15:58:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138273#M1292</guid>
      <dc:creator>Kanyange</dc:creator>
      <dc:date>2015-01-26T15:58:44Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138274#M1293</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;[1] Examine the strength of correlation coefficient of variable i with the dependent variable. Say, choose r &amp;gt; 0.5 or some reasonable value.&lt;/P&gt;&lt;P&gt;[2] Suppose you have chosen X1, X2, X3, ... X10. Check the linear relationship between each of them. Keep in your model only those that have lesser correlations( to avoid collinearity).&lt;/P&gt;&lt;P&gt;[3] Explore this way until you can have manageable independent variables for your final model. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 16:05:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138274#M1293</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2015-01-26T16:05:16Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138275#M1294</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;PRE __jive_macro_name="quote" class="jive_text_macro jive_macro_quote"&gt;
&lt;P&gt;Kanyange wrote:&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;I would like to build a linear regression model and I need to select the most important variables (highly correlated to my target)..Does anyone know a great technique (Not Decision trees), I am using Base SAS for data preparation and I have around 1000 variables for a start.So I want to reduce the number of variables and select the most important before I enter them into Proc Reg.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/PRE&gt;&lt;P&gt;A "great technique"??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Well, I offer a suggestion and I will let others decide if it is "great" or not.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Your situation is the exact situation that Partial Least Squares regression was designed for. PROC PLS does this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, your thought process needs to be adjusted. There really is no way to select the "most important" variables when they are all correlated with each other as well as with the response variable. This is logically impossible to do, and thus no statistical method can pick out the unambiguous "most important" variables in this situation. What PLS does is it selects linear combinations of your variables that are highly correlated with the response, and then it is up to you to use and/or interpret these linear combination. Please note: this is not a "variable reduction" method, but it is the technique that fits your situation perfectly.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 16:24:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138275#M1294</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2015-01-26T16:24:58Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138276#M1295</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;As PaigeMiller noted above, there is a way to find "few" representations of a large set of explanatory variables, which I think is very common in Financial Econometrics. You've probably found it on the internet already, but a simple example would be (even though you can't see the full effect, because x1 and x2 lack correlating variables):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Data R_Input (Drop=i j);&lt;BR /&gt;&amp;nbsp; Array X{*} X1-X1000;&lt;BR /&gt;&amp;nbsp; Do j=1 To 140;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; X1=Ranuni(1);&lt;BR /&gt; X2=Ranuni(1);&lt;BR /&gt; If j le 120 Then Y=X1*3-X2*0.4+2+Ranuni(1)-0.5; &lt;BR /&gt; Else Call Missing (y);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Do i=3 To 1000;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; X{i}=Ranuni(1); &lt;BR /&gt; End;&lt;BR /&gt; Output;&lt;BR /&gt;&amp;nbsp; End;&lt;BR /&gt;Run;&lt;/P&gt;&lt;P&gt;Proc PLS Data=R_Input Outmodel=Estimation Method=PLS CV=Split;&lt;BR /&gt;&amp;nbsp; Model Y = X1-X1000;&lt;BR /&gt;&amp;nbsp; Output Out=Estimate Predicted=Y_Hat;&lt;BR /&gt;Run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 17:28:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138276#M1295</guid>
      <dc:creator>user24feb</dc:creator>
      <dc:date>2015-01-26T17:28:03Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138277#M1296</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Actually I don't know if Financial Econometrics use PLS regularly or not ... but it is used in lots of fields, including Sociology, Biology, Chemistry, Physics, Spectroscopy, Manufacturing, Food Science and probably a bunch of others.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Jan 2015 19:18:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138277#M1296</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2015-01-26T19:18:30Z</dc:date>
    </item>
    <item>
      <title>Re: Select Most Important variables before a Linear Regression, Please Help Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138278#M1297</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you want pick up variables , Check proc glmselect .&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Jan 2015 10:16:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Select-Most-Important-variables-before-a-Linear-Regression/m-p/138278#M1297</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2015-01-27T10:16:22Z</dc:date>
    </item>
  </channel>
</rss>

