BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sas1990
Calcite | Level 5

I'm trying to generate some bootstrapped samples and hoping to confirm whether the approach I've taken is sound. The idea is to have 250 samples that consist of 169,301 randomly selected rows, with selection based on a variable that represents the rate of an event occurring in the real world (the variable death_rate in the code below, which ranges from 0.0003 to 0.15). The higher the value of this variable the more likely selection will be, and selection will occur from the dataset of about 15 million rows until 169,301 rows are selected (with no row selected more than once). However, I'm unsure if I've interpreted the documentation correctly to achieve this. Currently my code looks like this:

 

proc surveyselect data=merged_data out=BootSamples noprint seed=123 sampsize=169301 out=OUTHITS
method=PPS
reps=250;
Size death_rate;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Typically, bootstrap samples are obtained by sampling with replacement, so perhaps you want to use the PPS_WR option? See https://blogs.sas.com/content/iml/2016/02/10/sample-with-replacement-and-unequal-probability-in-sas.... 

 

Also, you are specifying two OUT= data sets. I suspect you meant to use the OUTHITS option in the second case.

I created some small data so you can inspect the results. I hope the following answers your question or at least points you in the correct direction:

 

data merged_data;
input x death_rate;
datalines;
1 .1
2 .2
3 .3
4 .4
5 .5
6 .6
8 .8
;

%let SampSize=100;  /* 169301 */
proc surveyselect data=merged_data out=BootSamples noprint seed=123 
     sampsize=&SampSize OUTHITS
     method=PPS_WR
     reps=250;
Size death_rate;
run;

/* are the relative frequencies correct? */
proc freq data=BootSamples;
tables x;
run;

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Typically, bootstrap samples are obtained by sampling with replacement, so perhaps you want to use the PPS_WR option? See https://blogs.sas.com/content/iml/2016/02/10/sample-with-replacement-and-unequal-probability-in-sas.... 

 

Also, you are specifying two OUT= data sets. I suspect you meant to use the OUTHITS option in the second case.

I created some small data so you can inspect the results. I hope the following answers your question or at least points you in the correct direction:

 

data merged_data;
input x death_rate;
datalines;
1 .1
2 .2
3 .3
4 .4
5 .5
6 .6
8 .8
;

%let SampSize=100;  /* 169301 */
proc surveyselect data=merged_data out=BootSamples noprint seed=123 
     sampsize=&SampSize OUTHITS
     method=PPS_WR
     reps=250;
Size death_rate;
run;

/* are the relative frequencies correct? */
proc freq data=BootSamples;
tables x;
run;
sas1990
Calcite | Level 5
Thanks Rick! Seems to work much like intended

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 526 views
  • 3 likes
  • 2 in conversation