<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Variables redundancy and overfitting in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651949#M834</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/158810"&gt;@pvareschi&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The presence of redundant variables results in a more complex model than needed, as it increase the number of predictors. Complex models typically suffer from overfitting as the risk to "learn" errors increase (redundant information, which can be noise and far from a real-world setting)&lt;/P&gt;
&lt;P&gt;Best,&lt;/P&gt;</description>
    <pubDate>Sat, 30 May 2020 09:15:01 GMT</pubDate>
    <dc:creator>ed_sas_member</dc:creator>
    <dc:date>2020-05-30T09:15:01Z</dc:date>
    <item>
      <title>Variables redundancy and overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651945#M832</link>
      <description>&lt;P&gt;Re: Predictive Modeling Using Logistic Regression&lt;/P&gt;
&lt;P&gt;Would it be possible to clarify why the presence of redundant inputs may increase the risk of overfitting (see page 3-34 of the course text)?&lt;/P&gt;</description>
      <pubDate>Sat, 30 May 2020 08:49:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651945#M832</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-30T08:49:24Z</dc:date>
    </item>
    <item>
      <title>Re: Variables redundancy and overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651949#M834</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/158810"&gt;@pvareschi&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The presence of redundant variables results in a more complex model than needed, as it increase the number of predictors. Complex models typically suffer from overfitting as the risk to "learn" errors increase (redundant information, which can be noise and far from a real-world setting)&lt;/P&gt;
&lt;P&gt;Best,&lt;/P&gt;</description>
      <pubDate>Sat, 30 May 2020 09:15:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651949#M834</guid>
      <dc:creator>ed_sas_member</dc:creator>
      <dc:date>2020-05-30T09:15:01Z</dc:date>
    </item>
    <item>
      <title>Re: Variables redundancy and overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651950#M835</link>
      <description>&lt;P&gt;Redundant variables also cause the regression coefficients to swing wildly in some cases, to the extent that they can wind up with the wrong sign. And this leads to unstable models, and coefficients that are not interpretable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or in somewhat more statistical terms, high correlation between the predictor variables inflates the variance of the coefficients, meaning the coefficients can vary widely from the true value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The above holds true for most modeling techniques. It does not hold true for Partial Least Squares, which can be used in the presence of redundant variables and is much less susceptible to the above issues.&lt;/P&gt;</description>
      <pubDate>Sat, 30 May 2020 09:31:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651950#M835</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-05-30T09:31:09Z</dc:date>
    </item>
    <item>
      <title>Re: Variables redundancy and overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651981#M839</link>
      <description>&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":thumbs_up:"&gt;👍&lt;/span&gt; Thank you!&lt;/P&gt;</description>
      <pubDate>Sat, 30 May 2020 14:44:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/651981#M839</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-30T14:44:41Z</dc:date>
    </item>
    <item>
      <title>Re: Variables redundancy and overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/652298#M854</link>
      <description>&lt;P&gt;I highly recommend that you reduce redundancy among your predictor variables first before you deal with irrelevancy of the predictor variables to the target variable. Including redundant variables increases the risk of over-fitting because your model has become overly complex and might be too sensitive to the peculiarities in the sample and therefore will not generalize well to new data. The performance of the variable selection methods such as stepwise and backward will be compromised if you have a high degree of multicollinearity among your predictor variables.&amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2020 16:55:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Variables-redundancy-and-overfitting/m-p/652298#M854</guid>
      <dc:creator>sasmlp</dc:creator>
      <dc:date>2020-06-01T16:55:24Z</dc:date>
    </item>
  </channel>
</rss>

