<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to check overfitting in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388949#M5892</link>
    <description>&lt;P&gt;Are you partitioning your data?&lt;/P&gt;</description>
    <pubDate>Thu, 17 Aug 2017 19:56:26 GMT</pubDate>
    <dc:creator>WendyCzika</dc:creator>
    <dc:date>2017-08-17T19:56:26Z</dc:date>
    <item>
      <title>How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388643#M5862</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am using 4 different classifiers of Random Forest, SVM, Decision Tree and Neural Network on different datasets in one of the datasets all of the classifiers are giving 100% accuracy which I do not understand why and in other datasets these algorithms are giving above 90% accuracies. Random forest performs best in all datasets. Could anyone please suggest how can I make sure if my algorithms are not overfitting? if yes, then how can I overcome that?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 22:16:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388643#M5862</guid>
      <dc:creator>geniusgenie</dc:creator>
      <dc:date>2017-08-16T22:16:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388646#M5863</link>
      <description>Check the distribution of outcomes in the data. Is one dataset different than the others?</description>
      <pubDate>Wed, 16 Aug 2017 22:17:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388646#M5863</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-08-16T22:17:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388650#M5864</link>
      <description>&lt;P&gt;Yes, all datasets are different.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 22:29:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388650#M5864</guid>
      <dc:creator>geniusgenie</dc:creator>
      <dc:date>2017-08-16T22:29:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388664#M5865</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112474"&gt;@geniusgenie&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Yes, all datasets are different.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;That wasn't my question. Are the distributions of the outcome variable you're testing different in the data sets? And if so, can that be what's causing the issue?&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2017 00:05:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388664#M5865</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-08-17T00:05:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388934#M5891</link>
      <description>Hi Reeza, yes the distributions of outcome variables are different in the datasets and for me it's an issue&lt;BR /&gt;</description>
      <pubDate>Thu, 17 Aug 2017 18:45:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388934#M5891</guid>
      <dc:creator>geniusgenie</dc:creator>
      <dc:date>2017-08-17T18:45:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388949#M5892</link>
      <description>&lt;P&gt;Are you partitioning your data?&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2017 19:56:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388949#M5892</guid>
      <dc:creator>WendyCzika</dc:creator>
      <dc:date>2017-08-17T19:56:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388982#M5895</link>
      <description>&lt;P&gt;I am partitioning data.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2017 22:12:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/388982#M5895</guid>
      <dc:creator>geniusgenie</dc:creator>
      <dc:date>2017-08-17T22:12:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/389183#M5902</link>
      <description>&lt;P&gt;Check for leakage on the 100% dataset.&amp;nbsp;&lt;/P&gt;&lt;P&gt;90% is not unusual for the others.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Aug 2017 18:38:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/389183#M5902</guid>
      <dc:creator>mandata_ad</dc:creator>
      <dc:date>2017-08-18T18:38:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to check overfitting</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/390765#M5918</link>
      <description>&lt;P&gt;When you get 100% accuracy, you need to go back and check your input variables to make sure you have not inadvertently included a variable containing information you would not have available when scoring new data. &amp;nbsp; For example, I could easily predict which accounts were going to default if there was a field that indicated how much money was lost when the loan did default, but that information would never be available for new data. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can also get very high classification ratings (although not 100% typically) when you have a rare event that only happens a small percentage of the time. &amp;nbsp;Suppose your event happens 1% of the time, then you can say "nobody has the event" and be 99% correct with respect to misclassification yet not have any model that is of any usefulness. &amp;nbsp;More details would be needed to speculate further on the misclassification aspect.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Data Mining scenarios, you typically have sufficient data to use holdout data (validation data) to demonstrate the model is useful empirically. &amp;nbsp;When you have more limited data, you are left with cross-validation options. &amp;nbsp; When you have very limited data, you are left with assessing things based on your business knowledge. &amp;nbsp;The less data there is, the more uncertainty you are likely to have. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;With regards to choosing the 'best' model, you need to incorporate your business objectives. &amp;nbsp;You can choose a model based on many different statistics yet none of them might actually be best suited to your situation depending on the business objectives you are trying to accomplish. &amp;nbsp;You need to identify your goals and assess how costly it is to misclassify someone which can be complex if you have more than two levels. &amp;nbsp; In the end, your choice of strategy should support the goals you had when you started building the model in the first place. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&lt;/P&gt;</description>
      <pubDate>Thu, 24 Aug 2017 21:19:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-check-overfitting/m-p/390765#M5918</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2017-08-24T21:19:04Z</dc:date>
    </item>
  </channel>
</rss>

