Hi all,
I know there are several context about this subject in internet, however, when I try to get %50/%50 bad rate sample by using SAS Code or by using Sample node on Enterprise Miner, I could not reach my aim. I try to get stratified sample based on Target and Date variables. I found some code on internet and I also tried Sample node but I could not get 50/50 sample, I don't exactly know what values should I select when I use the Enterprise Miner Sample Node on Properties panel.
I used following propeties when I try to get the sample on Enterprise Miner;
On the other hand, if there is code to get under sample of data based on bad rate, I would like to learn the method to get sample by using Enterprise Guide.
I have a sample data set as below, I want to get 12(1)/12(0) sample based on Target and Date variables, if someone can help me, I will be glad to learn these methods.
Data Have;
Length ID 8 Date $ 20 Variable1 8 Variable2 8 Variable3 8 Target 8;
Infile Datalines Missover ;
Input ID Date Variable1 Variable2 Variable3 Target;
Datalines;
1 20150101 100 200 300 0
1 20150201 100 200 300 1
1 20150301 100 200 300 0
2 20150101 100 200 300 1
2 20150201 100 200 300 0
2 20150301 100 200 300 0
3 20150101 100 200 300 0
3 20150201 100 200 300 0
3 20150301 100 200 300 1
4 20150101 100 200 300 0
4 20150201 100 200 300 0
4 20150301 100 200 300 1
5 20150101 100 200 300 1
5 20150201 100 200 300 0
5 20150301 100 200 300 0
6 20150101 100 200 300 0
6 20150201 100 200 300 1
6 20150301 100 200 300 0
7 20150101 100 200 300 1
7 20150201 100 200 300 0
7 20150301 100 200 300 0
8 20150101 100 200 300 0
8 20150201 100 200 300 1
8 20150301 100 200 300 0
9 20150101 100 200 300 0
9 20150201 100 200 300 0
9 20150301 100 200 300 1
10 20150101 100 200 300 0
10 20150201 100 200 300 0
10 20150301 100 200 300 1
11 20150101 100 200 300 1
11 20150201 100 200 300 0
11 20150301 100 200 300 0
12 20150101 100 200 300 0
12 20150201 100 200 300 1
12 20150301 100 200 300 0
;
Run;
Thank you,
Use proc surveyselect
proc sort data=have; by date target; run;
proc surveyselect data=have out=samples sampsize=12;
strata date target;
id id;
run;
If you want to oversample, i.e. get a sample size greater than the population, then do:
proc sort data=have; by date target; run;
proc surveyselect data=have out=samples sampsize=12 method=urs outhits;
strata date target;
id id;
run;
Thank you,
Your first code gives following error;
ERROR: The sample size, 12, is greater than the number of sampling units, 8.
ERROR: The sample size, 12, is greater than the number of sampling units, 4.
And I don't exactly understand what your second code gives us and what Method=URS&Outhits do? Could you give more detail, please? I want to get a code which export 24 rows being 12 bad and 12 good based on Target&Date.
And on Enterprise Miner, what should I do, to get following results?
@PGStats, Any idea about this subject?
Any suggestion about the subject?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.