DATA Step, Macro, Functions and more

Stratified Sampling Macro

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Stratified Sampling Macro

Hi,

 

Please can you help me with the following programming task.  I have an individual-level (1 row per person) dataset that contains a Group ID variable.  Depending on the value of the Group ID, I would like to split the group using stratified random sampling (stratification variable: Score_Band).  The groups that require splitting and the manner of the split are provided in a 'control' dataset, as follows:

 

StartGroupEndGroupPercent
1125
1225
1325
1425
5550
5625
5725

 

In the above example table, the subset of individuals belonging to group 1 are to be split across 4 groups (labelled 1 - 4), each containing 25% of the volume.  Individuals belonging to group 5 are to be split into three groups (5 - 7), containing 50%, 25% and 25% of the volume, respectively.

 

I am trying to write some code that will take an input dataset (1 row per individual) that contains an initial GroupID variable, and maps it to a new GroupID variable, based on what is specified in the 'control' table.  Each split is to be based on stratified random sampling (variable Score_Band).

 

Any help would be greatly appreciated,

 

Hoa


Accepted Solutions
Solution
‎03-07-2017 09:14 AM
Super User
Posts: 9,691

Re: Stratified Sampling Macro

OR could be done more simpler.

 

 

data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
 set control;
 by startgroup;
 if first.startgroup then cum=0;
 cum+percent;
run;
data key;
 set control;
  by startgroup;
  lag=lag(cum);
  if first.startgroup then start=1;
   else start=lag+1;
   
  do i=start to cum;
   output;
  end;
  drop lag start cum percent;
run;



/*Input dataset*/
data have;
 do group=1,5;
  do x=1 to 200;
   output;
  end;
 end;
run;

proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;

data want;
 merge temp key(rename=(startgroup=group i=groupid));
 by group groupid;
run;

proc print;run;

View solution in original post


All Replies
Super User
Posts: 9,691

Re: Stratified Sampling Macro

Is it what you looking for.

 

data have;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	.25
1	2	.25
1	3	.25
1	4	.25
5	5	.50
5	6	.25
5	7	.25
;
run;

proc surveyselect data=have out=want sampsize=100 outhits method=pps_wr;
strata startgroup;
size percent;
run;

proc print ;run;
PROC Star
Posts: 7,366

Re: Stratified Sampling Macro

I agree with the approach that @Ksharp suggested but, based on your requirements, think you might want to use a different method. I'd suggest:

 

proc surveyselect data=have out=want sampsize=100 outhits method=pps_sys;
  strata startgroup;
  size percent;
run;

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 16

Re: Stratified Sampling Macro

Thank you both for your quick and helpful replies - I really appreciate it!

 

Apologies if I'm misunderstanding, but I'm wondering how I can apply the sampling logic you've provided to my input dataset.  In addition to the 'control' table below...

 

GroupEndGroupPercent
110.25
120.25
130.25
140.25
550.5
560.25
570.25

 

I also have an input dataset (please see the attached file).  The input dataset contains one row per individual.  Based on the control table above, I would like to split the existing groups into sub-groups, using the stratified sampling procedure.

 

There are 42 individuals belonging to group 1 in the input table.  I would like these to be split into 4 groups:

 

1 - containing roughly 25% of the original group 1

2 - containing roughly 25% of the original group 1

3 - containing roughly 25% of the original group 1

4 - containing roughly 25% of the original group 1

 

Similarly, there are 234 individuals belonging to group 5 in the input table.  I would like these split into 3 groups:

 

5 - containing roughly 50% of the original group 5

6 - containing roughly 25% of the original group 5

7 - containing roughly 25% of the original group 5

 

On the input dataset, I have a variable called Score_Band (with possible values 'A', 'B' and 'C').  When I'm splitting the dataset, I like the sampling to be stratified using Score_Band.

 

I'm aiming to have an output dataset that is the same as the input dataset, but with a new group variable added - based on the new splits.

 

Thanks again for your help - I really appreciate it!

Super User
Posts: 9,691

Re: Stratified Sampling Macro

OK. How about this one ?

 

 

data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
 set control;
 by startgroup;
 if first.startgroup then cum=0;
 cum+percent;
run;
data key;
 set control;
  by startgroup;
  lag=lag(cum);
  if first.startgroup then start=1;
   else start=lag+1;
   
  do i=start to cum;
   output;
  end;
  drop lag start cum percent;
run;



/*Input dataset*/
data have;
 do group=1,5;
  do x=1 to 200;
   output;
  end;
 end;
run;
data have;
 set have;
 call streaminit(123456789);
 random=rand('uniform');
run;


proc rank data=have out=temp groups=100 ;
by group;
var random;
ranks r;
run;
proc sort data=temp ;
by group r;
run;
data temp;
 set temp;
 i=r+1;
 drop random r;
run;

data want;
 merge temp key(rename=(startgroup=group));
 by group i;
run;

proc print;run;

Solution
‎03-07-2017 09:14 AM
Super User
Posts: 9,691

Re: Stratified Sampling Macro

OR could be done more simpler.

 

 

data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
 set control;
 by startgroup;
 if first.startgroup then cum=0;
 cum+percent;
run;
data key;
 set control;
  by startgroup;
  lag=lag(cum);
  if first.startgroup then start=1;
   else start=lag+1;
   
  do i=start to cum;
   output;
  end;
  drop lag start cum percent;
run;



/*Input dataset*/
data have;
 do group=1,5;
  do x=1 to 200;
   output;
  end;
 end;
run;

proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;

data want;
 merge temp key(rename=(startgroup=group i=groupid));
 by group groupid;
run;

proc print;run;
Occasional Contributor
Posts: 16

Re: Stratified Sampling Macro

Thank you very much for the quick response!  I'll give it a try and will report back! Smiley Happy

Occasional Contributor
Posts: 16

Re: Stratified Sampling Macro

Thank you very much - this is just what I needed!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 192 views
  • 3 likes
  • 3 in conversation