Solved: Random sampling in different groups

ursula · Posted 10-03-2018 01:02 PM

Hi There,

Below is the example data, there will be 4 groups with the same values of variables B and C.

For example: id 001 (50 20), id 005 (50 20), id 007 (50 20) categories as the same group, because they have the same values of B and C, and so on ...

I want to do random sampling for 2 samples from each group.

so in this example data, I will get 8 samples from 15 observations.

My actual data set have thousands of observations.

I wonder if there are codes to run the sampling at once by group.

Thank you in advance.

data have;
input ID $ B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 30
015 70 30
;
run;

PGStats · Posted 10-03-2018 01:57 PM

For your case, @ballardw proposed method would amount to

data have;
input id $3. B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20 
007 50 20
008 40 20 
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 20
015 70 30
;

proc sort data=have; by b c; run;

proc surveyselect data=have out=want sampsize=2;
strata b c;
run;

PG

View solution in original post

ballardw · Posted 10-03-2018 01:28 PM

Not following your description as it is hard to read as pasted into code box.

I would suggest adding a variable that holds a group identifier. Then you can use proc surveyselect with a strata variable to control sampling by "group".

A brief example that randomly selects records using the SEX variable to define "groups" for 3 females and 5 males from the SASHELP.CLASS data set.

proc sort data=sashelp.class out=work.class;
  by sex;
run;

proc surveyselect data=work.class out=work.sample
   sampsize= (3 5);
strata sex;
run;

The strata statement in surveyselect requires the data to be sorted by the strata variable. The Sampsize= tells SAS how many records to select. With the sampsize=(3 5) it says take 3 from the first level of the strata variable and 5 from the second level. A single numeral would indicate the same number from each. Or similarly you can use SAMPRATE to indicate a percentage of each strata.

There are additional options as well to use other than Simple Random Sample (the default) and including additional information and variables wanted in the output set.

The output data set as above provides variables containing the SelectionProb(ability) and SamplingWeight for use in procedures that might want a weight.

ursula · Posted 10-03-2018 01:52 PM

sorry, I know it hard to read.

Here I repeated it again the data set

Below is the example data, there will be 4 groups with the same values of variables B and C.

For example: id 001 (50 20), id 005 (50 20), id 007 (50 20) categories as the same group, because they have the same values of B and C, and so on ...

I want to do random sampling for 2 samples from each group.

so in this example data, I will get 8 samples from 15 observations.

My actual data set have thousands of observations.

I wonder if there are codes to run the sampling at once by group.

Thank you in advance.

data have;
input ID $ B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20
007 50 20
008 40 20
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 30
015 70 30
;
run;

ursula · Posted 10-03-2018 02:09 PM

Thank you so much, PG.

it works!

PGStats · Posted 10-03-2018 01:57 PM

For your case, @ballardw proposed method would amount to

data have;
input id $3. B C;
datalines;
001 50 20
002 50 30
003 40 20
004 70 30
005 50 20
006 40 20 
007 50 20
008 40 20 
009 50 30
010 70 30
011 40 20
012 70 30
013 40 20
014 50 20
015 70 30
;

proc sort data=have; by b c; run;

proc surveyselect data=have out=want sampsize=2;
strata b c;
run;

PG

Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Re: Random sampling in different groups

Registration is open

SAS Training: Just a Click Away