<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Oversample and Score classification example in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Oversample-and-Score-classification-example/m-p/379496#M5647</link>
    <description>&lt;P&gt;Enterprise miner 14.1&lt;/P&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am following this example&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-oversample-approach-in/ta-p/223599?nobounce" target="_blank"&gt;https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-oversample-approach-in/ta-p/223599?nobounce&lt;/A&gt; to familiarize myself with Oversampling. &amp;nbsp; As an additional learning, I connected a score node to the model comparison node. My thought is to copy the original data set and the first sample and score this data set. &amp;nbsp;So, I added set a copy of the original German Credit with a role of score and copied the first sample node (same seed, same sample&amp;nbsp;size, and same event percent .05/.95) and ran the workflow. &amp;nbsp;&lt;/P&gt;&lt;P&gt;Class Variable Summary Statistics&lt;BR /&gt;&lt;BR /&gt;Data Role=SCORE Output Type=CLASSIFICATION&lt;BR /&gt;&lt;BR /&gt;Numeric Formatted Frequency&lt;BR /&gt;Variable Value Value Count Percent&lt;BR /&gt;&lt;BR /&gt;I_good_bad . BAD &amp;nbsp; &amp;nbsp; 204 34&lt;BR /&gt;I_good_bad . GOOD 396 66&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Data Role=SCORE Output Type=MODELDECISION&lt;BR /&gt;&lt;BR /&gt;Numeric Formatted Frequency&lt;BR /&gt;Variable Value Value Count Percent&lt;BR /&gt;&lt;BR /&gt;D_good_bad . BAD &amp;nbsp; &amp;nbsp; 226 37.6667&lt;BR /&gt;D_good_bad . GOOD 374 62.3333&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had expected the results to be closer to the sample proportions (Bad .05 vs Good .&amp;nbsp;95), but the results appear close to the original data set. &amp;nbsp;When I look at the score code, I see the original data set's posterior probabilities with no adjustment.&lt;/P&gt;&lt;P&gt;Label P_good_badgood='Predicted: good_bad=good';&lt;BR /&gt;P_good_badgood = 0.7;&lt;BR /&gt;Label P_good_badbad='Predicted: good_bad=bad';&lt;BR /&gt;P_good_badbad = 0.3;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Am I just approaching this problem incorrectly? Have I made an error or just an error in understanding? I've attached a copy of my workflow, I renamed it .jpg. &amp;nbsp;If you drop this you should be able to import into EM. &amp;nbsp;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jul 2017 18:19:25 GMT</pubDate>
    <dc:creator>jlh368</dc:creator>
    <dc:date>2017-07-26T18:19:25Z</dc:date>
    <item>
      <title>Oversample and Score classification example</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversample-and-Score-classification-example/m-p/379496#M5647</link>
      <description>&lt;P&gt;Enterprise miner 14.1&lt;/P&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am following this example&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-oversample-approach-in/ta-p/223599?nobounce" target="_blank"&gt;https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-oversample-approach-in/ta-p/223599?nobounce&lt;/A&gt; to familiarize myself with Oversampling. &amp;nbsp; As an additional learning, I connected a score node to the model comparison node. My thought is to copy the original data set and the first sample and score this data set. &amp;nbsp;So, I added set a copy of the original German Credit with a role of score and copied the first sample node (same seed, same sample&amp;nbsp;size, and same event percent .05/.95) and ran the workflow. &amp;nbsp;&lt;/P&gt;&lt;P&gt;Class Variable Summary Statistics&lt;BR /&gt;&lt;BR /&gt;Data Role=SCORE Output Type=CLASSIFICATION&lt;BR /&gt;&lt;BR /&gt;Numeric Formatted Frequency&lt;BR /&gt;Variable Value Value Count Percent&lt;BR /&gt;&lt;BR /&gt;I_good_bad . BAD &amp;nbsp; &amp;nbsp; 204 34&lt;BR /&gt;I_good_bad . GOOD 396 66&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Data Role=SCORE Output Type=MODELDECISION&lt;BR /&gt;&lt;BR /&gt;Numeric Formatted Frequency&lt;BR /&gt;Variable Value Value Count Percent&lt;BR /&gt;&lt;BR /&gt;D_good_bad . BAD &amp;nbsp; &amp;nbsp; 226 37.6667&lt;BR /&gt;D_good_bad . GOOD 374 62.3333&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had expected the results to be closer to the sample proportions (Bad .05 vs Good .&amp;nbsp;95), but the results appear close to the original data set. &amp;nbsp;When I look at the score code, I see the original data set's posterior probabilities with no adjustment.&lt;/P&gt;&lt;P&gt;Label P_good_badgood='Predicted: good_bad=good';&lt;BR /&gt;P_good_badgood = 0.7;&lt;BR /&gt;Label P_good_badbad='Predicted: good_bad=bad';&lt;BR /&gt;P_good_badbad = 0.3;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Am I just approaching this problem incorrectly? Have I made an error or just an error in understanding? I've attached a copy of my workflow, I renamed it .jpg. &amp;nbsp;If you drop this you should be able to import into EM. &amp;nbsp;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 18:19:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversample-and-Score-classification-example/m-p/379496#M5647</guid>
      <dc:creator>jlh368</dc:creator>
      <dc:date>2017-07-26T18:19:25Z</dc:date>
    </item>
    <item>
      <title>Re: Oversample and Score classification example</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversample-and-Score-classification-example/m-p/385123#M5677</link>
      <description>&lt;P&gt;I took a deeper dive into the example listed above and I realize there are many inputs that affect the score percentages. The change I had questioned below, the scoring percentages being closer to the original data set percentages, was&amp;nbsp;the effect of the sample proportion. &amp;nbsp; I adjusted the data partition percentages from Train/validate 50/50 to 70/30 and noticed the change in the model. This change, in turn, affected the scoring proportions. I also did see the updated prior probabilities in the SAS score code node. &amp;nbsp; In short, it was doing what it was supposed to do, and I learned a bit. &amp;nbsp;Any suggestions on topics to follow up on from here?&lt;/P&gt;</description>
      <pubDate>Wed, 02 Aug 2017 21:07:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversample-and-Score-classification-example/m-p/385123#M5677</guid>
      <dc:creator>jlh368</dc:creator>
      <dc:date>2017-08-02T21:07:09Z</dc:date>
    </item>
  </channel>
</rss>

