BookmarkSubscribeRSS Feed
beibeiwhy
Calcite | Level 5

I used EMnier to do oversampling. Th original target variable is binary with proportion as below:

 

                     
Variable            Value              Count      Percent  
 
Target                   0                   252       32.7273
Target                   1                   518       67.2727

 

After Oversampling, I got:

 

Data=SAMPLE
             
Variable     Value     Count      Percent   
 
Target       0             252          50
Target       1             252          50

 

The sample size Target='1' is reduced from 518 to 252. This is not the result I want.

I want to increase the target ='0' sample size from 252 to 518.

 

Does anyone know hot to solove this problem?

Any suggestion is appreciated!

 

 

 

4 REPLIES 4
beibeiwhy
Calcite | Level 5
I'm new to EMiner. Really need help with this.
Thanks a lot
JasonXin
SAS Employee
Hi, In EM, see attached picture. Once you load the data into EM, the YES group (in the picture) should be 1 in your case and NO group should be your 0 group. Count=999 should be your 518 and Count=967 should be your 252. To the right, in replace of 0.5081, enter 1. In replace of 0.4919, enter 2.055555556 (=518/252). In plain English, doing so you are telling EM to treat the 518 1 group as it is. And treat the 252 0 group as if there are 2.055555556*252~518. Logically. Hope this helps? Jason Xin
priorsem.jpg
beibeiwhy
Calcite | Level 5
Hi Jason, your reply is very helpful. So using this prior decision, I don't need to use the oversampling node any more?
JasonXin
SAS Employee
Hi, First of all, there is no over-sampling node in EM. I figure you meant Sample Node. The Sample Node has random, systematic, First, N, stratify... None of them allows you to change the ration between 1 and 0 on the target. The purpose of sampling is to take a subset, in one way or another, to represent the master source. The goal is to represent, not to alter. On the other hand, the matter of oversampling is to recompose a sample, therefore to alter, logically. Sampling Node often is used in situation like : The qualified model universe has 20 million observations. I need to take 5% sample to make it work in EM. In this sense, sampling really is not analytical/technical. But oversampling is every bit of analytics. In other words, the reason you run sampling should not overlap with that driving oversampling, although the act of oversampling per SE is sampling. Hope this helps? Thank you for using SAS. Best. Jason Xin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1186 views
  • 0 likes
  • 2 in conversation