<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unbalanced data - miner in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Unbalanced-data-miner/m-p/440909#M6761</link>
    <description>&lt;P&gt;Hi there&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I&amp;nbsp;am trying to build a classifier with miner and my issue comes from unbalanced data. My dataset is made of 109,194 records, from which 1379 have a target=1 and the remaining 107815 have a target=0, leading to a 98.74%/1.26% ratio. My 30 predictors are all numeric.&lt;/P&gt;&lt;P&gt;I have tested three way to handle this unbalanced data: first one, I do no sample at all as per the following diagram&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="method1 (raw)" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18879i5DF1BC45436F1C3B/image-size/large?v=v2&amp;amp;px=999" role="button" title="method1.png" alt="method1 (raw)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;method1 (raw)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Second one I over sample the minority class 1 to represent about 30% of the dataset using the Sampling node (criterion property set a level-based)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="method2 (over sampling)" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18880iA9CBAFCB8A07EDEA/image-size/large?v=v2&amp;amp;px=999" role="button" title="method2.png" alt="method2 (over sampling)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;method2 (over sampling)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Last one, I do not over sample but change the values in the diagonal in the Decision weight tabs form the Input Node option and put as a weight for the rare event the ratio of probability of common event / rare event, namely 98.74/1.26=78.36.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="method3 (Decision Weights)" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18881i1CBDA1CB4F7B25A2/image-size/large?v=v2&amp;amp;px=999" role="button" title="method3.png" alt="method3 (Decision Weights)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;method3 (Decision Weights)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The results are as follow&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Method1 results" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18882i73C7581B845852D1/image-size/large?v=v2&amp;amp;px=999" role="button" title="method1_results.png" alt="Method1 results" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Method1 results&lt;/span&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Method2 results" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18883i45588B11BC94E5EC/image-size/large?v=v2&amp;amp;px=999" role="button" title="method2_results.png" alt="Method2 results" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Method2 results&lt;/span&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Method3 results" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18884i2F09D5AE882F1FE5/image-size/large?v=v2&amp;amp;px=999" role="button" title="method3_results.png" alt="Method3 results" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Method3 results&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I do not find the results tremendously convincing (and still confused as why false/true positive are non integer for method2). Am I doing anything wrong? I know there i a lot bout unbalanced data but I do not seem to find a way to apply any solution to my case. Thanks&lt;/P&gt;&lt;P&gt;Nicolas&lt;/P&gt;</description>
    <pubDate>Wed, 28 Feb 2018 14:59:41 GMT</pubDate>
    <dc:creator>NicolasC</dc:creator>
    <dc:date>2018-02-28T14:59:41Z</dc:date>
    <item>
      <title>Unbalanced data - miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Unbalanced-data-miner/m-p/440909#M6761</link>
      <description>&lt;P&gt;Hi there&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I&amp;nbsp;am trying to build a classifier with miner and my issue comes from unbalanced data. My dataset is made of 109,194 records, from which 1379 have a target=1 and the remaining 107815 have a target=0, leading to a 98.74%/1.26% ratio. My 30 predictors are all numeric.&lt;/P&gt;&lt;P&gt;I have tested three way to handle this unbalanced data: first one, I do no sample at all as per the following diagram&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="method1 (raw)" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18879i5DF1BC45436F1C3B/image-size/large?v=v2&amp;amp;px=999" role="button" title="method1.png" alt="method1 (raw)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;method1 (raw)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Second one I over sample the minority class 1 to represent about 30% of the dataset using the Sampling node (criterion property set a level-based)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="method2 (over sampling)" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18880iA9CBAFCB8A07EDEA/image-size/large?v=v2&amp;amp;px=999" role="button" title="method2.png" alt="method2 (over sampling)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;method2 (over sampling)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Last one, I do not over sample but change the values in the diagonal in the Decision weight tabs form the Input Node option and put as a weight for the rare event the ratio of probability of common event / rare event, namely 98.74/1.26=78.36.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="method3 (Decision Weights)" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18881i1CBDA1CB4F7B25A2/image-size/large?v=v2&amp;amp;px=999" role="button" title="method3.png" alt="method3 (Decision Weights)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;method3 (Decision Weights)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The results are as follow&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Method1 results" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18882i73C7581B845852D1/image-size/large?v=v2&amp;amp;px=999" role="button" title="method1_results.png" alt="Method1 results" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Method1 results&lt;/span&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Method2 results" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18883i45588B11BC94E5EC/image-size/large?v=v2&amp;amp;px=999" role="button" title="method2_results.png" alt="Method2 results" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Method2 results&lt;/span&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Method3 results" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/18884i2F09D5AE882F1FE5/image-size/large?v=v2&amp;amp;px=999" role="button" title="method3_results.png" alt="Method3 results" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Method3 results&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I do not find the results tremendously convincing (and still confused as why false/true positive are non integer for method2). Am I doing anything wrong? I know there i a lot bout unbalanced data but I do not seem to find a way to apply any solution to my case. Thanks&lt;/P&gt;&lt;P&gt;Nicolas&lt;/P&gt;</description>
      <pubDate>Wed, 28 Feb 2018 14:59:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Unbalanced-data-miner/m-p/440909#M6761</guid>
      <dc:creator>NicolasC</dc:creator>
      <dc:date>2018-02-28T14:59:41Z</dc:date>
    </item>
    <item>
      <title>Re: Unbalanced data - miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Unbalanced-data-miner/m-p/440928#M6762</link>
      <description>&lt;P&gt;Hi Nicolas,&lt;/P&gt;
&lt;P&gt;Maybe this thread can help you while someone takes a second look into what you did?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-rare-event-fixed/td-p/161991" target="_blank"&gt;https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-rare-event-fixed/td-p/161991&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;Would that be an option for you?&lt;BR /&gt;&lt;BR /&gt;Best,&lt;BR /&gt;-Miguel&lt;/P&gt;</description>
      <pubDate>Wed, 28 Feb 2018 15:51:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Unbalanced-data-miner/m-p/440928#M6762</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2018-02-28T15:51:00Z</dc:date>
    </item>
  </channel>
</rss>

