DATA Step, Macro, Functions and more

%50/%50 Under Sampling Based On Bad Rate By Using Enterprise Guide And Enterprise Miner

Reply
Contributor
Posts: 51

%50/%50 Under Sampling Based On Bad Rate By Using Enterprise Guide And Enterprise Miner

Hi all,

 

I know there are several context about this subject in internet, however, when I try to get %50/%50 bad rate sample by using SAS Code or by using Sample node on Enterprise Miner, I could not reach my aim. I try to get stratified sample  based on Target and Date variables. I found some code on internet and I also tried Sample node but I could not get 50/50 sample, I don't exactly know what values should I select when I use the Enterprise Miner Sample Node on Properties panel.

 

I used following propeties when I try to get the sample on Enterprise Miner;

 

Desired.png

 

On the other hand, if there is code to get under sample  of data based on bad rate, I would like to learn the method to get sample by using Enterprise Guide.

 

I have a sample data set as below, I want to get  12(1)/12(0) sample based on Target and Date variables, if someone can help me, I will be glad to learn these methods.

 

Data Have;
Length ID 8 Date $ 20 Variable1 8 Variable2 8 Variable3 8 Target 8;
Infile Datalines Missover ;
Input ID Date Variable1 Variable2 Variable3 Target;
Datalines;
1 20150101 100 200 300 0
1 20150201 100 200 300 1
1 20150301 100 200 300 0
2 20150101 100 200 300 1
2 20150201 100 200 300 0
2 20150301 100 200 300 0
3 20150101 100 200 300 0
3 20150201 100 200 300 0
3 20150301 100 200 300 1
4 20150101 100 200 300 0
4 20150201 100 200 300 0
4 20150301 100 200 300 1
5 20150101 100 200 300 1
5 20150201 100 200 300 0
5 20150301 100 200 300 0
6 20150101 100 200 300 0
6 20150201 100 200 300 1
6 20150301 100 200 300 0
7 20150101 100 200 300 1
7 20150201 100 200 300 0
7 20150301 100 200 300 0
8 20150101 100 200 300 0
8 20150201 100 200 300 1
8 20150301 100 200 300 0
9 20150101 100 200 300 0
9 20150201 100 200 300 0
9 20150301 100 200 300 1
10 20150101 100 200 300 0
10 20150201 100 200 300 0
10 20150301 100 200 300 1
11 20150101 100 200 300 1
11 20150201 100 200 300 0
11 20150301 100 200 300 0
12 20150101 100 200 300 0
12 20150201 100 200 300 1
12 20150301 100 200 300 0
;
Run;

Thank you,

Respected Advisor
Posts: 4,641

Re: %50/%50 Under Sampling Based On Bad Rate By Using Enterprise Guide And Enterprise Miner

[ Edited ]

Use proc surveyselect

 

proc sort data=have; by date target; run;

proc surveyselect data=have out=samples sampsize=12;
strata date target;
id id;
run;

If you want to oversample, i.e. get a sample size greater than the population, then do:

 

proc sort data=have; by date target; run;

proc surveyselect data=have out=samples sampsize=12 method=urs outhits;
strata date target;
id id;
run;

 

PG
Contributor
Posts: 51

Re: %50/%50 Under Sampling Based On Bad Rate By Using Enterprise Guide And Enterprise Miner

Thank you,

 

Your first code gives following error;

 

ERROR: The sample size, 12, is greater than the number of sampling units, 8.
ERROR: The sample size, 12, is greater than the number of sampling units, 4.

 

And I don't exactly understand what your second code gives us and what Method=URS&Outhits do? Could you give more detail, please? I want to get a code which export 24 rows being 12 bad and 12 good based on Target&Date.

 

And on Enterprise Miner, what should I do, to get following results?

 

MinerOut.png

Contributor
Posts: 51

Re: %50/%50 Under Sampling Based On Bad Rate By Using Enterprise Guide And Enterprise Miner

@PGStats, Any idea about this subject?

Contributor
Posts: 51

Re: %50/%50 Under Sampling Based On Bad Rate By Using Enterprise Guide And Enterprise Miner

Any suggestion about the subject?

Ask a Question
Discussion stats
  • 4 replies
  • 287 views
  • 0 likes
  • 2 in conversation