I'm trying to generate some bootstrapped samples and hoping to confirm whether the approach I've taken is sound. The idea is to have 250 samples that consist of 169,301 randomly selected rows, with selection based on a variable that represents the rate of an event occurring in the real world (the variable death_rate in the code below, which ranges from 0.0003 to 0.15). The higher the value of this variable the more likely selection will be, and selection will occur from the dataset of about 15 million rows until 169,301 rows are selected (with no row selected more than once). However, I'm unsure if I've interpreted the documentation correctly to achieve this. Currently my code looks like this:
proc surveyselect data=merged_data out=BootSamples noprint seed=123 sampsize=169301 out=OUTHITS
method=PPS
reps=250;
Size death_rate;
run;
Typically, bootstrap samples are obtained by sampling with replacement, so perhaps you want to use the PPS_WR option? See https://blogs.sas.com/content/iml/2016/02/10/sample-with-replacement-and-unequal-probability-in-sas....
Also, you are specifying two OUT= data sets. I suspect you meant to use the OUTHITS option in the second case.
I created some small data so you can inspect the results. I hope the following answers your question or at least points you in the correct direction:
data merged_data;
input x death_rate;
datalines;
1 .1
2 .2
3 .3
4 .4
5 .5
6 .6
8 .8
;
%let SampSize=100; /* 169301 */
proc surveyselect data=merged_data out=BootSamples noprint seed=123
sampsize=&SampSize OUTHITS
method=PPS_WR
reps=250;
Size death_rate;
run;
/* are the relative frequencies correct? */
proc freq data=BootSamples;
tables x;
run;
Typically, bootstrap samples are obtained by sampling with replacement, so perhaps you want to use the PPS_WR option? See https://blogs.sas.com/content/iml/2016/02/10/sample-with-replacement-and-unequal-probability-in-sas....
Also, you are specifying two OUT= data sets. I suspect you meant to use the OUTHITS option in the second case.
I created some small data so you can inspect the results. I hope the following answers your question or at least points you in the correct direction:
data merged_data;
input x death_rate;
datalines;
1 .1
2 .2
3 .3
4 .4
5 .5
6 .6
8 .8
;
%let SampSize=100; /* 169301 */
proc surveyselect data=merged_data out=BootSamples noprint seed=123
sampsize=&SampSize OUTHITS
method=PPS_WR
reps=250;
Size death_rate;
run;
/* are the relative frequencies correct? */
proc freq data=BootSamples;
tables x;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.