<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What does R square mean in variable selection? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768521#M8888</link>
    <description>Thanks for the reply. I understand the meaning of R square. I was asking how is R square calculated in the first step. In your manual, it includes three 2 steps (3 for binary target). So in the first step, is SAS running linear regression for each input on the output, and then pick those inputs whose R square is above the threshold?</description>
    <pubDate>Mon, 20 Sep 2021 08:54:59 GMT</pubDate>
    <dc:creator>ycenycute</dc:creator>
    <dc:date>2021-09-20T08:54:59Z</dc:date>
    <item>
      <title>What does R square mean in variable selection?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768417#M8886</link>
      <description>&lt;P&gt;In SAS Enterprise Miner, we can choose variable selection. Under variable selection, we have R square method. I checked this &lt;A href="https://lexjansen.com/nesug/nesug07/sa/sa17.pdf" target="_self"&gt;document&lt;/A&gt;, which is a good explanation of how R squared method works. I am wondering in the first step, how is R square calculated. Is it that run regression on the target variable using each input, then get the R square for each input variable?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 18 Sep 2021 11:25:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768417#M8886</guid>
      <dc:creator>ycenycute</dc:creator>
      <dc:date>2021-09-18T11:25:09Z</dc:date>
    </item>
    <item>
      <title>Re: What does R square mean in variable selection?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768421#M8887</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/393358"&gt;@ycenycute&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;R-square(d) measures the strength of the relationship between your model (your input / independent variables &amp;amp; the functional form of the model) and the dependent variable on a convenient 0 – 100% scale.&lt;/P&gt;
&lt;P&gt;It measures how much of the total variance in your dependent variable is explained by the model, ... the more, the better of course.&lt;BR /&gt;R-Squared is ubiquitous in statistics, but that is also why people are no longer critical (R-Squared is not always blissful).&lt;/P&gt;
&lt;P&gt;The main disadvantage of R-Squared is that it will always increase if you add an additional input to your model (even if that input is not significantly contributing to the power of the model, but is only explaining a bit of noise).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyway, How does the R-square selection method in the Variable Selection node of Enterprise Miner work?&lt;/P&gt;
&lt;P&gt;Read it here:&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;SAS® Enterprise Miner™ &lt;STRONG&gt;15.1&lt;/STRONG&gt;: Reference Help&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Variable Selection Node&lt;BR /&gt;&lt;A href="https://go.documentation.sas.com/doc/en/emref/15.1/n1m7rvh6yyb3mmn0zavezsher4ml.htm" target="_blank"&gt;https://go.documentation.sas.com/doc/en/emref/15.1/n1m7rvh6yyb3mmn0zavezsher4ml.htm&lt;/A&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;In short, in the Forward Stepwise Regression, ... at each successive step, an additional input variable is chosen that provides the largest incremental increase in the model R**2.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Forward Stepwise means you start with zero inputs in the model and then you add the one that provides the biggest R**2 in the simple model (the model with one input), then you add a 2nd variable (the one that&amp;nbsp;provides the largest incremental increase in the model R**2) and so on ... until stopping criteria are met.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I propose you come back to us with what you do not understand over there (i.e. in the doc).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Kind regards,&lt;BR /&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 18 Sep 2021 14:30:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768421#M8887</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-09-18T14:30:16Z</dc:date>
    </item>
    <item>
      <title>Re: What does R square mean in variable selection?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768521#M8888</link>
      <description>Thanks for the reply. I understand the meaning of R square. I was asking how is R square calculated in the first step. In your manual, it includes three 2 steps (3 for binary target). So in the first step, is SAS running linear regression for each input on the output, and then pick those inputs whose R square is above the threshold?</description>
      <pubDate>Mon, 20 Sep 2021 08:54:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768521#M8888</guid>
      <dc:creator>ycenycute</dc:creator>
      <dc:date>2021-09-20T08:54:59Z</dc:date>
    </item>
    <item>
      <title>Re: What does R square mean in variable selection?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768583#M8889</link>
      <description>&lt;P&gt;Some variable selection algorithms (often known as "stepwise") go like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Step 1: compute the R-square for all of the x variables, then select the variable with the highest R-squared to be the first variable included in the model. (For example, let's say X7 has the highest R-squared, the model is now Y = X7)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Step 2: compute the R-squared for all of the possible models with X7 and ONE other variable. Pick the highest R-squared to be the second variable included in the model. (For example, let's say X2 has the high R-squared in this step, the model is now Y=X7 X2)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Continue until the increase in R-squared is less than some pre-specified threshold, or until the variable added isn't statistically significant, or ... there are all sorts of variations of this algorithm.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;NOTE: among all possible models with&amp;nbsp;&lt;EM&gt;k&lt;/EM&gt; terms, this algorithm does not guarantee to find the model with&amp;nbsp;&lt;EM&gt;k&lt;/EM&gt; terms that has the highest R-squared.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Sep 2021 17:02:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-does-R-square-mean-in-variable-selection/m-p/768583#M8889</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-09-20T17:02:05Z</dc:date>
    </item>
  </channel>
</rss>

