BookmarkSubscribeRSS Feed
Alirezax
Calcite | Level 5

Hi, I have a heavily imbalanced dataset with the rare target level at around 1% (binary variable) and I have 20000 observations in my training set (200 rare events). I need to get a sample with ~40000 observations where 50% of them are the rare event. I tried to use the sample node and do the standard oversampling in enterprise miner (see screenshot) as described here https://support.sas.com/kb/24/205.html

But all I get is a sample of 400 with the original 200 rare events so it is basically doing undersampling rather than

oversampling...

SAS EM.png

I would also like to use SMOTE rather than simple duplications but I do not see the option on Enterprise Miner. I checked all the other posts on SMOTE including all the links here https://communities.sas.com/t5/Statistical-Procedures/Assistance-with-SAS-code-for-SMOTE-and-adaptiv... but the sample SAS codes are difficult to understand and apply.

Can anybody help me with these two issues?

PS. My dataset contains both numeric and character input (predictor) variables.

1 REPLY 1
WendyCzika
SAS Employee

Oversampling is a misnomer.  It is actually undersampling as you've experienced.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1813 views
  • 0 likes
  • 2 in conversation