Hi all,
I know there are several context about this subject in internet, however, when I try to get %50/%50 bad rate sample by using SAS Code or by using Sample node on Enterprise Miner, I could not reach my aim. I try to get stratified sample based on Target and Date variables. I found some code on internet and I also tried Sample node but I could not get 50/50 sample, I don't exactly know what values should I select when I use the Enterprise Miner Sample Node on Properties panel.
I used following propeties when I try to get the sample on Enterprise Miner;
On the other hand, if there is code to get under sample of data based on bad rate, I would like to learn the method to get sample by using Enterprise Guide.
I have a sample data set as below, I want to get 12(1)/12(0) sample based on Target and Date variables, if someone can help me, I will be glad to learn these methods.
Data Have;
Length ID 8 Date $ 20 Variable1 8 Variable2 8 Variable3 8 Target 8;
Infile Datalines Missover ;
Input ID Date Variable1 Variable2 Variable3 Target;
Datalines;
1 20150101 100 200 300 0
1 20150201 100 200 300 1
1 20150301 100 200 300 0
2 20150101 100 200 300 1
2 20150201 100 200 300 0
2 20150301 100 200 300 0
3 20150101 100 200 300 0
3 20150201 100 200 300 0
3 20150301 100 200 300 1
4 20150101 100 200 300 0
4 20150201 100 200 300 0
4 20150301 100 200 300 1
5 20150101 100 200 300 1
5 20150201 100 200 300 0
5 20150301 100 200 300 0
6 20150101 100 200 300 0
6 20150201 100 200 300 1
6 20150301 100 200 300 0
7 20150101 100 200 300 1
7 20150201 100 200 300 0
7 20150301 100 200 300 0
8 20150101 100 200 300 0
8 20150201 100 200 300 1
8 20150301 100 200 300 0
9 20150101 100 200 300 0
9 20150201 100 200 300 0
9 20150301 100 200 300 1
10 20150101 100 200 300 0
10 20150201 100 200 300 0
10 20150301 100 200 300 1
11 20150101 100 200 300 1
11 20150201 100 200 300 0
11 20150301 100 200 300 0
12 20150101 100 200 300 0
12 20150201 100 200 300 1
12 20150301 100 200 300 0
;
Run;
Thank you,
Use proc surveyselect
proc sort data=have; by date target; run;
proc surveyselect data=have out=samples sampsize=12;
strata date target;
id id;
run;
If you want to oversample, i.e. get a sample size greater than the population, then do:
proc sort data=have; by date target; run;
proc surveyselect data=have out=samples sampsize=12 method=urs outhits;
strata date target;
id id;
run;
Thank you,
Your first code gives following error;
ERROR: The sample size, 12, is greater than the number of sampling units, 8.
ERROR: The sample size, 12, is greater than the number of sampling units, 4.
And I don't exactly understand what your second code gives us and what Method=URS&Outhits do? Could you give more detail, please? I want to get a code which export 24 rows being 12 bad and 12 good based on Target&Date.
And on Enterprise Miner, what should I do, to get following results?
@PGStats, Any idea about this subject?
Any suggestion about the subject?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.