## Stratified Sampling Macro

Solved
Occasional Contributor
Posts: 16

# Stratified Sampling Macro

Hi,

Please can you help me with the following programming task.  I have an individual-level (1 row per person) dataset that contains a Group ID variable.  Depending on the value of the Group ID, I would like to split the group using stratified random sampling (stratification variable: Score_Band).  The groups that require splitting and the manner of the split are provided in a 'control' dataset, as follows:

 StartGroup EndGroup Percent 1 1 25 1 2 25 1 3 25 1 4 25 5 5 50 5 6 25 5 7 25

In the above example table, the subset of individuals belonging to group 1 are to be split across 4 groups (labelled 1 - 4), each containing 25% of the volume.  Individuals belonging to group 5 are to be split into three groups (5 - 7), containing 50%, 25% and 25% of the volume, respectively.

I am trying to write some code that will take an input dataset (1 row per individual) that contains an initial GroupID variable, and maps it to a new GroupID variable, based on what is specified in the 'control' table.  Each split is to be based on stratified random sampling (variable Score_Band).

Any help would be greatly appreciated,

Hoa

Accepted Solutions
Solution
‎03-07-2017 09:14 AM
Super User
Posts: 10,850

## Re: Stratified Sampling Macro

OR could be done more simpler.

``````data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
set control;
by startgroup;
if first.startgroup then cum=0;
cum+percent;
run;
data key;
set control;
by startgroup;
lag=lag(cum);
if first.startgroup then start=1;
else start=lag+1;

do i=start to cum;
output;
end;
drop lag start cum percent;
run;

/*Input dataset*/
data have;
do group=1,5;
do x=1 to 200;
output;
end;
end;
run;

proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;

data want;
merge temp key(rename=(startgroup=group i=groupid));
by group groupid;
run;

proc print;run;``````

All Replies
Super User
Posts: 10,850

## Re: Stratified Sampling Macro

Is it what you looking for.

``````data have;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	.25
1	2	.25
1	3	.25
1	4	.25
5	5	.50
5	6	.25
5	7	.25
;
run;

proc surveyselect data=have out=want sampsize=100 outhits method=pps_wr;
strata startgroup;
size percent;
run;

proc print ;run;
``````
Super User
Posts: 8,220

## Re: Stratified Sampling Macro

I agree with the approach that @Ksharp suggested but, based on your requirements, think you might want to use a different method. I'd suggest:

```proc surveyselect data=have out=want sampsize=100 outhits method=pps_sys;
strata startgroup;
size percent;
run;
```

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 16

## Re: Stratified Sampling Macro

Thank you both for your quick and helpful replies - I really appreciate it!

Apologies if I'm misunderstanding, but I'm wondering how I can apply the sampling logic you've provided to my input dataset.  In addition to the 'control' table below...

 Group EndGroup Percent 1 1 0.25 1 2 0.25 1 3 0.25 1 4 0.25 5 5 0.5 5 6 0.25 5 7 0.25

I also have an input dataset (please see the attached file).  The input dataset contains one row per individual.  Based on the control table above, I would like to split the existing groups into sub-groups, using the stratified sampling procedure.

There are 42 individuals belonging to group 1 in the input table.  I would like these to be split into 4 groups:

1 - containing roughly 25% of the original group 1

2 - containing roughly 25% of the original group 1

3 - containing roughly 25% of the original group 1

4 - containing roughly 25% of the original group 1

Similarly, there are 234 individuals belonging to group 5 in the input table.  I would like these split into 3 groups:

5 - containing roughly 50% of the original group 5

6 - containing roughly 25% of the original group 5

7 - containing roughly 25% of the original group 5

On the input dataset, I have a variable called Score_Band (with possible values 'A', 'B' and 'C').  When I'm splitting the dataset, I like the sampling to be stratified using Score_Band.

I'm aiming to have an output dataset that is the same as the input dataset, but with a new group variable added - based on the new splits.

Thanks again for your help - I really appreciate it!

Super User
Posts: 10,850

## Re: Stratified Sampling Macro

``````data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
set control;
by startgroup;
if first.startgroup then cum=0;
cum+percent;
run;
data key;
set control;
by startgroup;
lag=lag(cum);
if first.startgroup then start=1;
else start=lag+1;

do i=start to cum;
output;
end;
drop lag start cum percent;
run;

/*Input dataset*/
data have;
do group=1,5;
do x=1 to 200;
output;
end;
end;
run;
data have;
set have;
call streaminit(123456789);
random=rand('uniform');
run;

proc rank data=have out=temp groups=100 ;
by group;
var random;
ranks r;
run;
proc sort data=temp ;
by group r;
run;
data temp;
set temp;
i=r+1;
drop random r;
run;

data want;
merge temp key(rename=(startgroup=group));
by group i;
run;

proc print;run;

``````
Solution
‎03-07-2017 09:14 AM
Super User
Posts: 10,850

## Re: Stratified Sampling Macro

OR could be done more simpler.

``````data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
set control;
by startgroup;
if first.startgroup then cum=0;
cum+percent;
run;
data key;
set control;
by startgroup;
lag=lag(cum);
if first.startgroup then start=1;
else start=lag+1;

do i=start to cum;
output;
end;
drop lag start cum percent;
run;

/*Input dataset*/
data have;
do group=1,5;
do x=1 to 200;
output;
end;
end;
run;

proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;

data want;
merge temp key(rename=(startgroup=group i=groupid));
by group groupid;
run;

proc print;run;``````
Occasional Contributor
Posts: 16

## Re: Stratified Sampling Macro

Thank you very much for the quick response!  I'll give it a try and will report back!

Occasional Contributor
Posts: 16

## Re: Stratified Sampling Macro

Thank you very much - this is just what I needed!
☑ This topic is solved.