<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic A problem with oversampling and SMOTE in SAS Enterprise Miner/SAS in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/A-problem-with-oversampling-and-SMOTE-in-SAS-Enterprise-Miner/m-p/738782#M8661</link>
    <description>&lt;P&gt;Hi, I have a heavily imbalanced dataset with the rare target level at around 1% (binary variable)&amp;nbsp;and I have 20000 observations in my training set (200 rare events). I need to get a sample with ~40000 observations where 50% of them are the rare event. I tried to use the sample node and do the standard oversampling in enterprise miner (see screenshot) as described here&amp;nbsp;&lt;A href="https://support.sas.com/kb/24/205.html)" target="_blank"&gt;https://support.sas.com/kb/24/205.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;But all I get is a sample of 400 with the original 200 rare events so it is basically doing undersampling rather than&lt;/P&gt;&lt;P&gt;oversampling...&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAS EM.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/58997iFD5C06C8031DAEBC/image-size/large?v=v2&amp;amp;px=999" role="button" title="SAS EM.png" alt="SAS EM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I would also like to use SMOTE rather than simple duplications but I do not see the option on Enterprise Miner. I checked all the other posts on SMOTE including all the links here&amp;nbsp;&lt;A href="https://communities.sas.com/t5/Statistical-Procedures/Assistance-with-SAS-code-for-SMOTE-and-adaptive-synthetic/m-p/257442" target="_blank"&gt;https://communities.sas.com/t5/Statistical-Procedures/Assistance-with-SAS-code-for-SMOTE-and-adaptive-synthetic/m-p/257442&lt;/A&gt;&amp;nbsp;but the sample SAS codes are difficult to understand and apply.&lt;/P&gt;&lt;P&gt;Can anybody help me with these two issues?&lt;/P&gt;&lt;P&gt;PS. My dataset contains both numeric and character input (predictor) variables.&lt;/P&gt;</description>
    <pubDate>Tue, 04 May 2021 04:38:17 GMT</pubDate>
    <dc:creator>Alirezax</dc:creator>
    <dc:date>2021-05-04T04:38:17Z</dc:date>
    <item>
      <title>A problem with oversampling and SMOTE in SAS Enterprise Miner/SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/A-problem-with-oversampling-and-SMOTE-in-SAS-Enterprise-Miner/m-p/738782#M8661</link>
      <description>&lt;P&gt;Hi, I have a heavily imbalanced dataset with the rare target level at around 1% (binary variable)&amp;nbsp;and I have 20000 observations in my training set (200 rare events). I need to get a sample with ~40000 observations where 50% of them are the rare event. I tried to use the sample node and do the standard oversampling in enterprise miner (see screenshot) as described here&amp;nbsp;&lt;A href="https://support.sas.com/kb/24/205.html)" target="_blank"&gt;https://support.sas.com/kb/24/205.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;But all I get is a sample of 400 with the original 200 rare events so it is basically doing undersampling rather than&lt;/P&gt;&lt;P&gt;oversampling...&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAS EM.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/58997iFD5C06C8031DAEBC/image-size/large?v=v2&amp;amp;px=999" role="button" title="SAS EM.png" alt="SAS EM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I would also like to use SMOTE rather than simple duplications but I do not see the option on Enterprise Miner. I checked all the other posts on SMOTE including all the links here&amp;nbsp;&lt;A href="https://communities.sas.com/t5/Statistical-Procedures/Assistance-with-SAS-code-for-SMOTE-and-adaptive-synthetic/m-p/257442" target="_blank"&gt;https://communities.sas.com/t5/Statistical-Procedures/Assistance-with-SAS-code-for-SMOTE-and-adaptive-synthetic/m-p/257442&lt;/A&gt;&amp;nbsp;but the sample SAS codes are difficult to understand and apply.&lt;/P&gt;&lt;P&gt;Can anybody help me with these two issues?&lt;/P&gt;&lt;P&gt;PS. My dataset contains both numeric and character input (predictor) variables.&lt;/P&gt;</description>
      <pubDate>Tue, 04 May 2021 04:38:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/A-problem-with-oversampling-and-SMOTE-in-SAS-Enterprise-Miner/m-p/738782#M8661</guid>
      <dc:creator>Alirezax</dc:creator>
      <dc:date>2021-05-04T04:38:17Z</dc:date>
    </item>
    <item>
      <title>Re: A problem with oversampling and SMOTE in SAS Enterprise Miner/SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/A-problem-with-oversampling-and-SMOTE-in-SAS-Enterprise-Miner/m-p/739482#M8662</link>
      <description>&lt;P&gt;Oversampling is a misnomer.&amp;nbsp; It is actually undersampling as you've experienced.&lt;/P&gt;</description>
      <pubDate>Thu, 06 May 2021 13:48:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/A-problem-with-oversampling-and-SMOTE-in-SAS-Enterprise-Miner/m-p/739482#M8662</guid>
      <dc:creator>WendyCzika</dc:creator>
      <dc:date>2021-05-06T13:48:36Z</dc:date>
    </item>
  </channel>
</rss>

