Help using Base SAS procedures

How to draw n random samples from the original dataset

Reply
Frequent Contributor
Posts: 131

How to draw n random samples from the original dataset

I took a few very tedious steps to first create a few random variables, then sort the data based on the random variables. After that, I used the data set to select the first 200 responses. As I need to get 100 random samples to test a model, I would have to repeat this process for 100 times. Very poor idea and labor work.

I am curious to learn if there is some code to automatically create 100 random samples from the orginal dataset. Something like:

%let i=100;

data work.data01 work.data02 work.data03 ... work.data&i;
  set work.fulldata; /*fulldata has 1000 observations*/
  do j=1 to 100;
  ...output to different data sets with 200 observations;
  end;
run;

:smileyconfused:

Respected Advisor
Posts: 3,799

How to draw n random samples from the original dataset

Use PROC SURVEYSELECT.  

Super User
Posts: 10,048

How to draw n random samples from the original dataset

Just as _null_ said.

How about:

%do i=1 %to 4;
proc surveyselect data=list_stock method=srs n=4 out=sample&i noprint;
 run;
%end;

Ksharp

Frequent Contributor
Posts: 131

How to draw n random samples from the original dataset

Thanks for reply.

Can proc surveyselect accommodate the flexibility that one part of the original data sample must be selected all the time, while the other party is used for drawing random samples. To put it simply, if there is a variable called ind. If ind=1, then all observations should be selected. If ind=0, then just select 100 observations (out of the origninal 500). How can I add this to the surveyselect procedure?

Respected Advisor
Posts: 3,799

How to draw n random samples from the original dataset

You want to do a stratified sample where IND is the strata.  The RATE or N parameters accept a list of rates or Ns to match the number of strata.  In the example IND=0 selects 4 obs and IND=1 selects all obs.  Or you can use RATE.  Be sure to use a difference well selected seed.

The REP parameter creates,  in this example 4 independent samples, use REPLICATE in a BY statement instead of those %DOs.  Everything will be much faster and neatly contained in one data set.

data test;

   set sashelp.class(in=in1) sashelp.class;

   ind = in1;

   run;

proc sort data=test;

   by ind;

   run;

proc surveyselect rep=4 n=(4,19) /*rate=(.4,1)*/ data=test out=sample seed=443754790;

   strata ind;

   run;

proc sort data=sample;

   by replicate;

   run;

proc print data=sample;

   by replicate;

   id replicate;

   run;

Super User
Posts: 10,048

How to draw n random samples from the original dataset

You also can split the origin dataset into two datasets ,one contains ind=1 and the other contains ind=0,

then

%do i=1 %to 4;

proc surveyselect data=list_stock method=srs n=4 out=sample&i noprint;

run;

data sample&i;

set sample&i ind_1;

run;

%end;

Ksharp

Ask a Question
Discussion stats
  • 5 replies
  • 179 views
  • 0 likes
  • 3 in conversation