<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to handle rare events and imbalanced samples in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-rare-events-and-imbalanced-samples/m-p/15238#M256061</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Frankly, I'd be worried about the stability of any logistic regression model that had only 16 outcomes.&amp;nbsp; Harrell states that one needs 10-20 outcomes per candidate predictor variable (e.g. degree of freedom). &lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[ &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="http://biostat.mc.vanderbilt.edu/wiki/Main/RmS#REGRESSION_MODELING_STRATEGIES"&gt;http://biostat.mc.vanderbilt.edu/wiki/Main/RmS#REGRESSION_MODELING_STRATEGIES&lt;/A&gt;&lt;SPAN&gt; ]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;With that rule of thumb, you could only consider one variable.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Doc Muhlbaier&lt;/P&gt;&lt;P&gt;Duke&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 27 Sep 2011 01:46:55 GMT</pubDate>
    <dc:creator>Doc_Duke</dc:creator>
    <dc:date>2011-09-27T01:46:55Z</dc:date>
    <item>
      <title>How to handle rare events and imbalanced samples</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-rare-events-and-imbalanced-samples/m-p/15237#M256060</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin-top: auto; margin-bottom: auto;"&gt;&lt;SPAN lang="EN-US" style="font-family: arial,helvetica,sans-serif; font-size: 12pt;"&gt;I am developing a prediction model with a logistic regression by using SASEnterprise Miner. The original sample (N=342) only has 16 target “1” category,which corresponds to 4,7% (16/342) of the observations. To handle theimbalanced sample and the rare events issue, at the sample general propertypanel, for the level based option I set sample proportion as 50.0. Hence I endup with a sample containing 32 observations (16 observations for target “1”category and 16 observations for the target “0” category), which I used todeveloped the prediction model. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="margin-top: auto; margin-bottom: auto;"&gt;&lt;SPAN lang="EN-US" style="font-family: arial,helvetica,sans-serif; font-size: 12pt;"&gt;However my PhD Adviser is concern because “the occurrence probability from the scoring is not the one that predicts correctly for the originaldistribution, it is the one that predicts correctly for the oversampled set”.For my PhD thesis stand point of view, I must provide references that thismethodology “oversampling” approach is legible. &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Sep 2011 12:19:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-handle-rare-events-and-imbalanced-samples/m-p/15237#M256060</guid>
      <dc:creator>Mina</dc:creator>
      <dc:date>2011-09-23T12:19:07Z</dc:date>
    </item>
    <item>
      <title>How to handle rare events and imbalanced samples</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-rare-events-and-imbalanced-samples/m-p/15238#M256061</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Frankly, I'd be worried about the stability of any logistic regression model that had only 16 outcomes.&amp;nbsp; Harrell states that one needs 10-20 outcomes per candidate predictor variable (e.g. degree of freedom). &lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[ &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="http://biostat.mc.vanderbilt.edu/wiki/Main/RmS#REGRESSION_MODELING_STRATEGIES"&gt;http://biostat.mc.vanderbilt.edu/wiki/Main/RmS#REGRESSION_MODELING_STRATEGIES&lt;/A&gt;&lt;SPAN&gt; ]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;With that rule of thumb, you could only consider one variable.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Doc Muhlbaier&lt;/P&gt;&lt;P&gt;Duke&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Sep 2011 01:46:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-handle-rare-events-and-imbalanced-samples/m-p/15238#M256061</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2011-09-27T01:46:55Z</dc:date>
    </item>
  </channel>
</rss>

