Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

SAS EMiner Oversampling reduced the traget sample size

Reply
New Contributor
Posts: 3

SAS EMiner Oversampling reduced the traget sample size

I used EMnier to do oversampling. Th original target variable is binary with proportion as below:

 

                     
Variable            Value              Count      Percent  
 
Target                   0                   252       32.7273
Target                   1                   518       67.2727

 

After Oversampling, I got:

 

Data=SAMPLE
             
Variable     Value     Count      Percent   
 
Target       0             252          50
Target       1             252          50

 

The sample size Target='1' is reduced from 518 to 252. This is not the result I want.

I want to increase the target ='0' sample size from 252 to 518.

 

Does anyone know hot to solove this problem?

Any suggestion is appreciated!

 

 

 

New Contributor
Posts: 3

Re: SAS EMiner Oversampling reduced the traget sample size

I'm new to EMiner. Really need help with this.
Thanks a lot
SAS Employee
Posts: 122

Re: SAS EMiner Oversampling reduced the traget sample size

Hi, In EM, see attached picture. Once you load the data into EM, the YES group (in the picture) should be 1 in your case and NO group should be your 0 group. Count=999 should be your 518 and Count=967 should be your 252. To the right, in replace of 0.5081, enter 1. In replace of 0.4919, enter 2.055555556 (=518/252). In plain English, doing so you are telling EM to treat the 518 1 group as it is. And treat the 252 0 group as if there are 2.055555556*252~518. Logically. Hope this helps? Jason Xin
priorsem.jpg
New Contributor
Posts: 3

Re: SAS EMiner Oversampling reduced the traget sample size

Hi Jason, your reply is very helpful. So using this prior decision, I don't need to use the oversampling node any more?
SAS Employee
Posts: 122

Re: SAS EMiner Oversampling reduced the traget sample size

Hi, First of all, there is no over-sampling node in EM. I figure you meant Sample Node. The Sample Node has random, systematic, First, N, stratify... None of them allows you to change the ration between 1 and 0 on the target. The purpose of sampling is to take a subset, in one way or another, to represent the master source. The goal is to represent, not to alter. On the other hand, the matter of oversampling is to recompose a sample, therefore to alter, logically. Sampling Node often is used in situation like : The qualified model universe has 20 million observations. I need to take 5% sample to make it work in EM. In this sense, sampling really is not analytical/technical. But oversampling is every bit of analytics. In other words, the reason you run sampling should not overlap with that driving oversampling, although the act of oversampling per SE is sampling. Hope this helps? Thank you for using SAS. Best. Jason Xin
Ask a Question
Discussion stats
  • 4 replies
  • 267 views
  • 0 likes
  • 2 in conversation