I took a few very tedious steps to first create a few random variables, then sort the data based on the random variables. After that, I used the data set to select the first 200 responses. As I need to get 100 random samples to test a model, I would have to repeat this process for 100 times. Very poor idea and labor work.
I am curious to learn if there is some code to automatically create 100 random samples from the orginal dataset. Something like:
%let i=100;
data work.data01 work.data02 work.data03 ... work.data&i;
set work.fulldata; /*fulldata has 1000 observations*/
do j=1 to 100;
...output to different data sets with 200 observations;
end;
run;
:smileyconfused:
Use PROC SURVEYSELECT.
Just as _null_ said.
How about:
%do i=1 %to 4; proc surveyselect data=list_stock method=srs n=4 out=sample&i noprint; run; %end;
Ksharp
Thanks for reply.
Can proc surveyselect accommodate the flexibility that one part of the original data sample must be selected all the time, while the other party is used for drawing random samples. To put it simply, if there is a variable called ind. If ind=1, then all observations should be selected. If ind=0, then just select 100 observations (out of the origninal 500). How can I add this to the surveyselect procedure?
You want to do a stratified sample where IND is the strata. The RATE or N parameters accept a list of rates or Ns to match the number of strata. In the example IND=0 selects 4 obs and IND=1 selects all obs. Or you can use RATE. Be sure to use a difference well selected seed.
The REP parameter creates, in this example 4 independent samples, use REPLICATE in a BY statement instead of those %DOs. Everything will be much faster and neatly contained in one data set.
data test;
set sashelp.class(in=in1) sashelp.class;
ind = in1;
run;
proc sort data=test;
by ind;
run;
proc surveyselect rep=4 n=(4,19) /*rate=(.4,1)*/ data=test out=sample seed=443754790;
strata ind;
run;
proc sort data=sample;
by replicate;
run;
proc print data=sample;
by replicate;
id replicate;
run;
You also can split the origin dataset into two datasets ,one contains ind=1 and the other contains ind=0,
then
%do i=1 %to 4;
proc surveyselect data=list_stock method=srs n=4 out=sample&i noprint;
run;
data sample&i;
set sample&i ind_1;
run;
%end;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.