turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Stratified Sampling Macro

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-06-2017 01:17 PM

Hi,

Please can you help me with the following programming task. I have an individual-level (1 row per person) dataset that contains a Group ID variable. Depending on the value of the Group ID, I would like to split the group using stratified random sampling (stratification variable: Score_Band). The groups that require splitting and the manner of the split are provided in a 'control' dataset, as follows:

StartGroup | EndGroup | Percent |

1 | 1 | 25 |

1 | 2 | 25 |

1 | 3 | 25 |

1 | 4 | 25 |

5 | 5 | 50 |

5 | 6 | 25 |

5 | 7 | 25 |

In the above example table, the subset of individuals belonging to group 1 are to be split across 4 groups (labelled 1 - 4), each containing 25% of the volume. Individuals belonging to group 5 are to be split into three groups (5 - 7), containing 50%, 25% and 25% of the volume, respectively.

I am trying to write some code that will take an input dataset (1 row per individual) that contains an initial GroupID variable, and maps it to a new GroupID variable, based on what is specified in the 'control' table. Each split is to be based on stratified random sampling (variable Score_Band).

Any help would be greatly appreciated,

Hoa

Accepted Solutions

Solution

03-07-2017
09:14 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2017 06:18 AM

OR could be done more simpler.

```
data control;
infile cards expandtabs truncover;
input StartGroup EndGroup Percent;
cards;
1 1 25
1 2 25
1 3 25
1 4 25
5 5 50
5 6 25
5 7 25
;
run;
data control;
set control;
by startgroup;
if first.startgroup then cum=0;
cum+percent;
run;
data key;
set control;
by startgroup;
lag=lag(cum);
if first.startgroup then start=1;
else start=lag+1;
do i=start to cum;
output;
end;
drop lag start cum percent;
run;
/*Input dataset*/
data have;
do group=1,5;
do x=1 to 200;
output;
end;
end;
run;
proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;
data want;
merge temp key(rename=(startgroup=group i=groupid));
by group groupid;
run;
proc print;run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-06-2017 10:52 PM

Is it what you looking for.

```
data have;
infile cards expandtabs truncover;
input StartGroup EndGroup Percent;
cards;
1 1 .25
1 2 .25
1 3 .25
1 4 .25
5 5 .50
5 6 .25
5 7 .25
;
run;
proc surveyselect data=have out=want sampsize=100 outhits method=pps_wr;
strata startgroup;
size percent;
run;
proc print ;run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-06-2017 11:15 PM

I agree with the approach that @Ksharp suggested but, based on your requirements, think you might want to use a different method. I'd suggest:

proc surveyselect data=have out=want sampsize=100 outhits method=pps_sys; strata startgroup; size percent; run;

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2017 04:15 AM

Thank you both for your quick and helpful replies - I really appreciate it!

Apologies if I'm misunderstanding, but I'm wondering how I can apply the sampling logic you've provided to my input dataset. In addition to the 'control' table below...

Group | EndGroup | Percent |

1 | 1 | 0.25 |

1 | 2 | 0.25 |

1 | 3 | 0.25 |

1 | 4 | 0.25 |

5 | 5 | 0.5 |

5 | 6 | 0.25 |

5 | 7 | 0.25 |

I also have an input dataset (please see the attached file). The input dataset contains one row per individual. Based on the control table above, I would like to split the existing groups into sub-groups, using the stratified sampling procedure.

There are 42 individuals belonging to group 1 in the input table. I would like these to be split into 4 groups:

1 - containing roughly 25% of the original group 1

2 - containing roughly 25% of the original group 1

3 - containing roughly 25% of the original group 1

4 - containing roughly 25% of the original group 1

Similarly, there are 234 individuals belonging to group 5 in the input table. I would like these split into 3 groups:

5 - containing roughly 50% of the original group 5

6 - containing roughly 25% of the original group 5

7 - containing roughly 25% of the original group 5

On the input dataset, I have a variable called Score_Band (with possible values 'A', 'B' and 'C'). When I'm splitting the dataset, I like the sampling to be stratified using Score_Band.

I'm aiming to have an output dataset that is the same as the input dataset, but with a new group variable added - based on the new splits.

Thanks again for your help - I really appreciate it!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2017 05:50 AM

OK. How about this one ?

```
data control;
infile cards expandtabs truncover;
input StartGroup EndGroup Percent;
cards;
1 1 25
1 2 25
1 3 25
1 4 25
5 5 50
5 6 25
5 7 25
;
run;
data control;
set control;
by startgroup;
if first.startgroup then cum=0;
cum+percent;
run;
data key;
set control;
by startgroup;
lag=lag(cum);
if first.startgroup then start=1;
else start=lag+1;
do i=start to cum;
output;
end;
drop lag start cum percent;
run;
/*Input dataset*/
data have;
do group=1,5;
do x=1 to 200;
output;
end;
end;
run;
data have;
set have;
call streaminit(123456789);
random=rand('uniform');
run;
proc rank data=have out=temp groups=100 ;
by group;
var random;
ranks r;
run;
proc sort data=temp ;
by group r;
run;
data temp;
set temp;
i=r+1;
drop random r;
run;
data want;
merge temp key(rename=(startgroup=group));
by group i;
run;
proc print;run;
```

Solution

03-07-2017
09:14 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2017 06:18 AM

OR could be done more simpler.

```
data control;
infile cards expandtabs truncover;
input StartGroup EndGroup Percent;
cards;
1 1 25
1 2 25
1 3 25
1 4 25
5 5 50
5 6 25
5 7 25
;
run;
data control;
set control;
by startgroup;
if first.startgroup then cum=0;
cum+percent;
run;
data key;
set control;
by startgroup;
lag=lag(cum);
if first.startgroup then start=1;
else start=lag+1;
do i=start to cum;
output;
end;
drop lag start cum percent;
run;
/*Input dataset*/
data have;
do group=1,5;
do x=1 to 200;
output;
end;
end;
run;
proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;
data want;
merge temp key(rename=(startgroup=group i=groupid));
by group groupid;
run;
proc print;run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2017 06:31 AM

Thank you very much for the quick response! I'll give it a try and will report back!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2017 09:15 AM

Thank you very much - this is just what I needed!