<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic For the extremely unbalanced data (less than 1% event incidence), what's the best method to deal? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/559444#M7802</link>
    <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; I am working on one project with extremely unbalanced data which has less than 1% event incidence (800 out of total 114600 obs), what's the best method to deal with this kind of data? As the expecting goal of this project is to provide rules to distinguish the bad event (less than 1%) in the future, I am using decision tree right now. But the performance is really bad, the decision tree will not go further, only stay with the root node. Any suggestion&amp;nbsp;on dealing with this kind of problem is welcome!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; BTW: I am using SAS miner 14.2 on the SAS Linux.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jade&lt;/P&gt;</description>
    <pubDate>Thu, 16 May 2019 18:46:54 GMT</pubDate>
    <dc:creator>Jade_SAS</dc:creator>
    <dc:date>2019-05-16T18:46:54Z</dc:date>
    <item>
      <title>For the extremely unbalanced data (less than 1% event incidence), what's the best method to deal?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/559444#M7802</link>
      <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; I am working on one project with extremely unbalanced data which has less than 1% event incidence (800 out of total 114600 obs), what's the best method to deal with this kind of data? As the expecting goal of this project is to provide rules to distinguish the bad event (less than 1%) in the future, I am using decision tree right now. But the performance is really bad, the decision tree will not go further, only stay with the root node. Any suggestion&amp;nbsp;on dealing with this kind of problem is welcome!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; BTW: I am using SAS miner 14.2 on the SAS Linux.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jade&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2019 18:46:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/559444#M7802</guid>
      <dc:creator>Jade_SAS</dc:creator>
      <dc:date>2019-05-16T18:46:54Z</dc:date>
    </item>
    <item>
      <title>Re: For the extremely unbalanced data (less than 1% event incidence), what's the best method to deal</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/559453#M7803</link>
      <description>Without any more information, my suggestion would be to switch to a case control option where you take your 800 and match it 1:N against the big data set, with N controls per case. &lt;BR /&gt;&lt;BR /&gt;I would then bootstrap it to try that approach X times and see if my model is stable. I have no idea how to do this in EM and don't have access to it though. Technically could be done in Base SAS though. &lt;BR /&gt;</description>
      <pubDate>Thu, 16 May 2019 18:55:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/559453#M7803</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-05-16T18:55:54Z</dc:date>
    </item>
    <item>
      <title>Re: For the extremely unbalanced data (less than 1% event incidence), what's the best method to deal</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/561484#M7812</link>
      <description>&lt;P&gt;Thank you Reeza!&lt;/P&gt;&lt;P&gt;Is there a reference paper of this procedure?&amp;nbsp; Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jade&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2019 17:24:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/561484#M7812</guid>
      <dc:creator>Jade_SAS</dc:creator>
      <dc:date>2019-05-24T17:24:18Z</dc:date>
    </item>
    <item>
      <title>Re: For the extremely unbalanced data (less than 1% event incidence), what's the best method to deal</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/561485#M7813</link>
      <description>Not really, you can look at PROC PSMATCH to get the matches and then PROC PHREG for conditional logistic regression. The documentation has examples for each.</description>
      <pubDate>Fri, 24 May 2019 17:29:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/561485#M7813</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-05-24T17:29:44Z</dc:date>
    </item>
    <item>
      <title>Re: For the extremely unbalanced data (less than 1% event incidence), what's the best method to deal</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/561493#M7814</link>
      <description>&lt;P&gt;Thank you Reeza!&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2019 17:39:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/For-the-extremely-unbalanced-data-less-than-1-event-incidence/m-p/561493#M7814</guid>
      <dc:creator>Jade_SAS</dc:creator>
      <dc:date>2019-05-24T17:39:23Z</dc:date>
    </item>
  </channel>
</rss>

