How to output dataset for every resampling when using "rep"

Occasional Contributor
Posts: 7

How to output dataset for every resampling when using "rep"



I have a dataset called "boston" which has 100 observations, I'd like to create 2 datasets of 100 observations by sampling with replacement my original dataset "boston". I am using this code:


%let rep = 2;

  proc surveyselect data = boston out = resample;

  seed = 1347 method = urs

  samprate = 1 outhits rep = &rep;


ods listing close;


This creates a dataset called "resample" which has a variable called "replicate" (= 1 or 2)  which identifies my 100 observations for each of my 2 samplings. However, I would like to output 2 datasets each with its own sampling of 100 observations, such as resample1 and resample2. How can I do that?


Thanks very much!

Super User
Posts: 13,358

Re: How to output dataset for every resampling when using "rep"

[ Edited ]

this might get you started:

You don't want to set a seed if you want different samples.

SAMPSIZE may be more reliable than SAMPRATE if you want a specific number of resultant selections.


Here's one way with an example call with a data set you should have to see if it is working correctly.

The reps and size are defaults that will be used if not supplied with the call. The indataset must exist, reps cannot be set to less than 1, size should be an integer > 0.

%macro resample (indataset=, outdata=, reps=2, size=100);
%do i=1 %to &reps;

   proc surveyselect data = &indataset. out = &outdata.&i. noprint
     method = urs
     sampsize = &size outhits ;


%resample (indataset=sashelp.class, outdata=work.resample, reps=2,size=5);

if you need this more flexible, such as the method you can parameters following this pattern but too many will likely complicate the code trying to get interactions straight.


Did you examine an output set with rep=2 to make sure it looked correct? The values of rep would  likely not meet your want and the number of records is another issue.

Super User
Posts: 23,346

Re: How to output dataset for every resampling when using "rep"

Although you can do this, it's generally not a good idea. Because then to process things further you need to then use a macro to have it run over each data set versus just using a BY statement in your procedure. 


If you're doing bootstrap or simulation this may be worth reading, it goes over how to simulate data in SAS and why you don't want to do it this way, though it covers both approaches.

Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation