BookmarkSubscribeRSS Feed
Alirezax
Calcite | Level 5

Hi, I have a heavily imbalanced dataset with the rare target level at around 1% (binary variable) and I have 20000 observations in my training set (200 rare events). I need to get a sample with ~40000 observations where 50% of them are the rare event. I tried to use the sample node and do the standard oversampling in enterprise miner (see screenshot) as described here https://support.sas.com/kb/24/205.html

But all I get is a sample of 400 with the original 200 rare events so it is basically doing undersampling rather than

oversampling...

SAS EM.png

I would also like to use SMOTE rather than simple duplications but I do not see the option on Enterprise Miner. I checked all the other posts on SMOTE including all the links here https://communities.sas.com/t5/Statistical-Procedures/Assistance-with-SAS-code-for-SMOTE-and-adaptiv... but the sample SAS codes are difficult to understand and apply.

Can anybody help me with these two issues?

PS. My dataset contains both numeric and character input (predictor) variables.

1 REPLY 1
WendyCzika
SAS Employee

Oversampling is a misnomer.  It is actually undersampling as you've experienced.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2166 views
  • 0 likes
  • 2 in conversation