Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Stratified random sampling

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Stratified random sampling

[ Edited ]

Hey guys! I've been trying to use proc surveyselect to perform stratified random sampling and calculate the total and average of the samples taken. I'm just starting out with SAS and the Enterprise Miner. Only been at it for a week! Anyway this is the code I have come up with so far. The sampling is from a Claims data set. Here is the code I have come up with so far. For some reason the code is not generating 10000 sample iterations which is what I need. In fact, it is only generating one! I am not getting any error messages.

 

%let nSim=10000;
%let nTot=225;
%let n1=100;
%let n2=75;
%let n3=50;
%let Seed=12345;
%MACRO sample;
proc surveyselect data=&EM_IMPORT_DATA
out=Lib.sampler
sampsize=(&n1 &n2 &n3)
seed=&Seed
rep=1;
strata Strata;
%MEND sample;
data Lib.Claims;
keep sim Claimstot Claimsavg;
array Claimsamt[&nTot];
do sim= 1 to &nSim;
Claimstot=0;
&sample;
do i= 1 to &nTot;
set Lib.sampler;
Claimsamt[i] = Claim_Dollars;
end;
do i= 1 to &nTot;
Claimstot=Claimstot + Claimsamt[i];
end;
Claimsavg = Claimstot/&nTot;
output;
end;

proc print data=Lib.Claims;
run;


Accepted Solutions
Solution
‎01-30-2016 02:04 AM
Respected Advisor
Posts: 4,649

Re: Stratified random sampling

It is more efficient to use replicate sampling than to do macro looping. In your loops, if you give the same seed to surveyselect for each iteration, you will get exactly the same sample every time.

Here is an example of replicated stratified sampling using BY processing:

 

/* Prepare sorted example data, keep only required variables */
proc sort data=sashelp.heart out=myData(keep=chol_status weight); 
where chol_status is not missing; 
by chol_status;
run;

/* Generate replicate stratified samples (chol_status has three values)*/
proc surveyselect data=myData out=mySamples
sampsize=(100 75 50)
seed=12345
rep=1000;
strata chol_status;
run;

/* Calculate statistics for each replicate sample */
proc sort data=mySamples; by replicate chol_status; run;

proc summary data=mySamples;
by replicate;
var weight;
output out=myStats mean= sum= / autoname;
run;

/* Look at the distribution of the statistics */
proc sgplot data=myStats;
histogram weight_mean;
density weight_mean;
run;
PG

View solution in original post


All Replies
Super User
Posts: 17,831

Re: Stratified random sampling

Your output dataset doesn't have a unique name for each iteration, so whatever you do will likely get overwritten.

 

I highly recommend looking into the Don't be Loopy paper by David Cassell on simulations in SAS.

 

It also looks like you're trying to execute the macro using &sample instead of %sample, so i'm not sure your code is doing what you expect. 

 

PS. Please post code using the Code {i} or the running man button in the editor and format it for readability. This also helps with debugging. 

Solution
‎01-30-2016 02:04 AM
Respected Advisor
Posts: 4,649

Re: Stratified random sampling

It is more efficient to use replicate sampling than to do macro looping. In your loops, if you give the same seed to surveyselect for each iteration, you will get exactly the same sample every time.

Here is an example of replicated stratified sampling using BY processing:

 

/* Prepare sorted example data, keep only required variables */
proc sort data=sashelp.heart out=myData(keep=chol_status weight); 
where chol_status is not missing; 
by chol_status;
run;

/* Generate replicate stratified samples (chol_status has three values)*/
proc surveyselect data=myData out=mySamples
sampsize=(100 75 50)
seed=12345
rep=1000;
strata chol_status;
run;

/* Calculate statistics for each replicate sample */
proc sort data=mySamples; by replicate chol_status; run;

proc summary data=mySamples;
by replicate;
var weight;
output out=myStats mean= sum= / autoname;
run;

/* Look at the distribution of the statistics */
proc sgplot data=myStats;
histogram weight_mean;
density weight_mean;
run;
PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 317 views
  • 1 like
  • 3 in conversation