07-11-2011 03:31 PM
I took a few very tedious steps to first create a few random variables, then sort the data based on the random variables. After that, I used the data set to select the first 200 responses. As I need to get 100 random samples to test a model, I would have to repeat this process for 100 times. Very poor idea and labor work.
I am curious to learn if there is some code to automatically create 100 random samples from the orginal dataset. Something like:
data work.data01 work.data02 work.data03 ... work.data&i;
set work.fulldata; /*fulldata has 1000 observations*/
do j=1 to 100;
...output to different data sets with 200 observations;
07-12-2011 05:04 AM
Thanks for reply.
Can proc surveyselect accommodate the flexibility that one part of the original data sample must be selected all the time, while the other party is used for drawing random samples. To put it simply, if there is a variable called ind. If ind=1, then all observations should be selected. If ind=0, then just select 100 observations (out of the origninal 500). How can I add this to the surveyselect procedure?
07-12-2011 07:37 AM
You want to do a stratified sample where IND is the strata. The RATE or N parameters accept a list of rates or Ns to match the number of strata. In the example IND=0 selects 4 obs and IND=1 selects all obs. Or you can use RATE. Be sure to use a difference well selected seed.
The REP parameter creates, in this example 4 independent samples, use REPLICATE in a BY statement instead of those %DOs. Everything will be much faster and neatly contained in one data set.
set sashelp.class(in=in1) sashelp.class;
ind = in1;
proc sort data=test;
proc surveyselect rep=4 n=(4,19) /*rate=(.4,1)*/ data=test out=sample seed=443754790;
proc sort data=sample;
proc print data=sample;
07-14-2011 11:12 PM
You also can split the origin dataset into two datasets ,one contains ind=1 and the other contains ind=0,
%do i=1 %to 4;
proc surveyselect data=list_stock method=srs n=4 out=sample&i noprint;
set sample&i ind_1;