BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HoaTruong
Obsidian | Level 7

Hi,

 

Please can you help me with the following programming task.  I have an individual-level (1 row per person) dataset that contains a Group ID variable.  Depending on the value of the Group ID, I would like to split the group using stratified random sampling (stratification variable: Score_Band).  The groups that require splitting and the manner of the split are provided in a 'control' dataset, as follows:

 

StartGroupEndGroupPercent
1125
1225
1325
1425
5550
5625
5725

 

In the above example table, the subset of individuals belonging to group 1 are to be split across 4 groups (labelled 1 - 4), each containing 25% of the volume.  Individuals belonging to group 5 are to be split into three groups (5 - 7), containing 50%, 25% and 25% of the volume, respectively.

 

I am trying to write some code that will take an input dataset (1 row per individual) that contains an initial GroupID variable, and maps it to a new GroupID variable, based on what is specified in the 'control' table.  Each split is to be based on stratified random sampling (variable Score_Band).

 

Any help would be greatly appreciated,

 

Hoa

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OR could be done more simpler.

 

 

data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
 set control;
 by startgroup;
 if first.startgroup then cum=0;
 cum+percent;
run;
data key;
 set control;
  by startgroup;
  lag=lag(cum);
  if first.startgroup then start=1;
   else start=lag+1;
   
  do i=start to cum;
   output;
  end;
  drop lag start cum percent;
run;



/*Input dataset*/
data have;
 do group=1,5;
  do x=1 to 200;
   output;
  end;
 end;
run;

proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;

data want;
 merge temp key(rename=(startgroup=group i=groupid));
 by group groupid;
run;

proc print;run;

View solution in original post

7 REPLIES 7
Ksharp
Super User

Is it what you looking for.

 

data have;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	.25
1	2	.25
1	3	.25
1	4	.25
5	5	.50
5	6	.25
5	7	.25
;
run;

proc surveyselect data=have out=want sampsize=100 outhits method=pps_wr;
strata startgroup;
size percent;
run;

proc print ;run;
art297
Opal | Level 21

I agree with the approach that @Ksharp suggested but, based on your requirements, think you might want to use a different method. I'd suggest:

 

proc surveyselect data=have out=want sampsize=100 outhits method=pps_sys;
  strata startgroup;
  size percent;
run;

Art, CEO, AnalystFinder.com

 

HoaTruong
Obsidian | Level 7

Thank you both for your quick and helpful replies - I really appreciate it!

 

Apologies if I'm misunderstanding, but I'm wondering how I can apply the sampling logic you've provided to my input dataset.  In addition to the 'control' table below...

 

GroupEndGroupPercent
110.25
120.25
130.25
140.25
550.5
560.25
570.25

 

I also have an input dataset (please see the attached file).  The input dataset contains one row per individual.  Based on the control table above, I would like to split the existing groups into sub-groups, using the stratified sampling procedure.

 

There are 42 individuals belonging to group 1 in the input table.  I would like these to be split into 4 groups:

 

1 - containing roughly 25% of the original group 1

2 - containing roughly 25% of the original group 1

3 - containing roughly 25% of the original group 1

4 - containing roughly 25% of the original group 1

 

Similarly, there are 234 individuals belonging to group 5 in the input table.  I would like these split into 3 groups:

 

5 - containing roughly 50% of the original group 5

6 - containing roughly 25% of the original group 5

7 - containing roughly 25% of the original group 5

 

On the input dataset, I have a variable called Score_Band (with possible values 'A', 'B' and 'C').  When I'm splitting the dataset, I like the sampling to be stratified using Score_Band.

 

I'm aiming to have an output dataset that is the same as the input dataset, but with a new group variable added - based on the new splits.

 

Thanks again for your help - I really appreciate it!

Ksharp
Super User

OK. How about this one ?

 

 

data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
 set control;
 by startgroup;
 if first.startgroup then cum=0;
 cum+percent;
run;
data key;
 set control;
  by startgroup;
  lag=lag(cum);
  if first.startgroup then start=1;
   else start=lag+1;
   
  do i=start to cum;
   output;
  end;
  drop lag start cum percent;
run;



/*Input dataset*/
data have;
 do group=1,5;
  do x=1 to 200;
   output;
  end;
 end;
run;
data have;
 set have;
 call streaminit(123456789);
 random=rand('uniform');
run;


proc rank data=have out=temp groups=100 ;
by group;
var random;
ranks r;
run;
proc sort data=temp ;
by group r;
run;
data temp;
 set temp;
 i=r+1;
 drop random r;
run;

data want;
 merge temp key(rename=(startgroup=group));
 by group i;
run;

proc print;run;

Ksharp
Super User

OR could be done more simpler.

 

 

data control;
infile cards expandtabs truncover;
input StartGroup	EndGroup	Percent;
cards;
1	1	25
1	2	25
1	3	25
1	4	25
5	5	50
5	6	25
5	7	25
;
run;

data control;
 set control;
 by startgroup;
 if first.startgroup then cum=0;
 cum+percent;
run;
data key;
 set control;
  by startgroup;
  lag=lag(cum);
  if first.startgroup then start=1;
   else start=lag+1;
   
  do i=start to cum;
   output;
  end;
  drop lag start cum percent;
run;



/*Input dataset*/
data have;
 do group=1,5;
  do x=1 to 200;
   output;
  end;
 end;
run;

proc surveyselect data=have out=temp groups=100;
strata group;
run;
proc sort data=temp;
by group groupid;
run;

data want;
 merge temp key(rename=(startgroup=group i=groupid));
 by group groupid;
run;

proc print;run;
HoaTruong
Obsidian | Level 7

Thank you very much for the quick response!  I'll give it a try and will report back! Smiley Happy

HoaTruong
Obsidian | Level 7
Thank you very much - this is just what I needed!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3631 views
  • 3 likes
  • 3 in conversation