Hi There,
Below is the example data, there will be 4 groups with the same values of variables B and C.
For example: id 001 (50 20), id 005 (50 20), id 007 (50 20) categories as the same group, because they have the same values of B and C, and so on ...
I want to do random sampling for 2 samples from each group.
so in this example data, I will get 8 samples from 15 observations.
My actual data set have thousands of observations.
I wonder if there are codes to run the sampling at once by group.
Thank you in advance.
data have;
input ID $ B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 30
015 70 30
;
run;
For your case, @ballardw proposed method would amount to
data have;
input id $3. B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 20
015 70 30
;
proc sort data=have; by b c; run;
proc surveyselect data=have out=want sampsize=2;
strata b c;
run;
Not following your description as it is hard to read as pasted into code box.
I would suggest adding a variable that holds a group identifier. Then you can use proc surveyselect with a strata variable to control sampling by "group".
A brief example that randomly selects records using the SEX variable to define "groups" for 3 females and 5 males from the SASHELP.CLASS data set.
proc sort data=sashelp.class out=work.class; by sex; run; proc surveyselect data=work.class out=work.sample sampsize= (3 5); strata sex; run;
The strata statement in surveyselect requires the data to be sorted by the strata variable. The Sampsize= tells SAS how many records to select. With the sampsize=(3 5) it says take 3 from the first level of the strata variable and 5 from the second level. A single numeral would indicate the same number from each. Or similarly you can use SAMPRATE to indicate a percentage of each strata.
There are additional options as well to use other than Simple Random Sample (the default) and including additional information and variables wanted in the output set.
The output data set as above provides variables containing the SelectionProb(ability) and SamplingWeight for use in procedures that might want a weight.
sorry, I know it hard to read.
Here I repeated it again the data set
Below is the example data, there will be 4 groups with the same values of variables B and C.
For example: id 001 (50 20), id 005 (50 20), id 007 (50 20) categories as the same group, because they have the same values of B and C, and so on ...
I want to do random sampling for 2 samples from each group.
so in this example data, I will get 8 samples from 15 observations.
My actual data set have thousands of observations.
I wonder if there are codes to run the sampling at once by group.
Thank you in advance.
data have;
input ID $ B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 30
015 70 30
;
run;
Thank you so much, PG.
it works!
For your case, @ballardw proposed method would amount to
data have;
input id $3. B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 20
015 70 30
;
proc sort data=have; by b c; run;
proc surveyselect data=have out=want sampsize=2;
strata b c;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.