<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fixing imabalance data in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746209#M8707</link>
    <description>&lt;P&gt;Here are three ways you could go :&lt;BR /&gt;1) oversample to 1:1&amp;nbsp; &amp;nbsp;or&amp;nbsp; 1:2&amp;nbsp; &amp;nbsp;or&amp;nbsp; &amp;nbsp;1:3&amp;nbsp; &amp;nbsp; or&amp;nbsp; 1:4&lt;BR /&gt;&lt;BR /&gt;or&lt;BR /&gt;2) using exactly logistic regression, but due to your sample size is big, that could be mission impossible.&lt;BR /&gt;&lt;BR /&gt;or &lt;BR /&gt;3)using penalty logistic regression by FIRTH option:&lt;BR /&gt;proc logistic.......&lt;BR /&gt;model ............ / firth ;&lt;BR /&gt;run;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Jun 2021 12:02:30 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2021-06-07T12:02:30Z</dc:date>
    <item>
      <title>Fixing imabalance data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746194#M8705</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Im working on binary classification model using logistic regression in SAS Base, but my data is extremely imbalanced...i need help in balancing the data or perhaps strategies in working with this kind of imbalance data using SAS BASE..see screenshot below for my data&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Solly7_0-1623065192016.png" style="width: 834px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/60077i9EEC7FA5B705393B/image-dimensions/834x299?v=v2" width="834" height="299" role="button" title="Solly7_0-1623065192016.png" alt="Solly7_0-1623065192016.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jun 2021 11:27:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746194#M8705</guid>
      <dc:creator>Solly7</dc:creator>
      <dc:date>2021-06-07T11:27:13Z</dc:date>
    </item>
    <item>
      <title>Re: Fixing imabalance data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746196#M8706</link>
      <description>&lt;P&gt;Let's say you want to have twice as many 0s as 1s (so 1/3 of the data is now 1). You can randomly select records with 0 to be removed so that you have 4572 0s and 2286 1s. Or if you want 1/2 0s and 1/2 1s, you can modify the selection process to produce 2286 0s and 2286 1s.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The method is called "oversampling", and here is a way to handle oversampled data in your logistic regression in SAS. &lt;A href="https://support.sas.com/kb/22/601.html" target="_blank" rel="noopener"&gt;https://support.sas.com/kb/22/601.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jun 2021 11:57:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746196#M8706</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-06-07T11:57:31Z</dc:date>
    </item>
    <item>
      <title>Re: Fixing imabalance data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746209#M8707</link>
      <description>&lt;P&gt;Here are three ways you could go :&lt;BR /&gt;1) oversample to 1:1&amp;nbsp; &amp;nbsp;or&amp;nbsp; 1:2&amp;nbsp; &amp;nbsp;or&amp;nbsp; &amp;nbsp;1:3&amp;nbsp; &amp;nbsp; or&amp;nbsp; 1:4&lt;BR /&gt;&lt;BR /&gt;or&lt;BR /&gt;2) using exactly logistic regression, but due to your sample size is big, that could be mission impossible.&lt;BR /&gt;&lt;BR /&gt;or &lt;BR /&gt;3)using penalty logistic regression by FIRTH option:&lt;BR /&gt;proc logistic.......&lt;BR /&gt;model ............ / firth ;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jun 2021 12:02:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746209#M8707</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-06-07T12:02:30Z</dc:date>
    </item>
    <item>
      <title>Re: Fixing imabalance data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746212#M8708</link>
      <description>Hi thanks for your propmpt response, so lets say i have sample data with 20000 samples and lets call it full_data...so do I need to split the the full_data into training and testing..then oversample the training data? or am i not understanding...</description>
      <pubDate>Mon, 07 Jun 2021 12:03:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746212#M8708</guid>
      <dc:creator>Solly7</dc:creator>
      <dc:date>2021-06-07T12:03:32Z</dc:date>
    </item>
    <item>
      <title>Re: Fixing imabalance data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746213#M8709</link>
      <description>&lt;P&gt;I would oversample first (reduce the imbalance), and then split that data randomly into training and validation.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jun 2021 12:05:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fixing-imabalance-data/m-p/746213#M8709</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-06-07T12:05:12Z</dc:date>
    </item>
  </channel>
</rss>

