Contributor
Posts: 21

# create groups

Hi all,

i want to create grp and subgrp variables

have:

 id date day1 day2 day3 100 1/15/2015 1 100 1/16/2015 2 100 1/17/2015 3 100 1/18/2015 4 100 1/19/2015 5 100 1/20/2015 6 100 1/21/2015 7 100 1/22/2015 8 100 1/23/2015 9 100 1/24/2015 10 100 1/25/2015 11 100 1/26/2015 12 100 1/27/2015 13 100 1/28/2015 14

want:

 id date day1 day2 day3 grp subgrp 100 1/15/2015 1 1 1 100 1/16/2015 2 1 2 100 1/17/2015 3 1 3 100 1/18/2015 4 1 4 100 1/19/2015 5 1 5 100 1/20/2015 6 2 1 100 1/21/2015 7 2 2 100 1/22/2015 8 2 3 100 1/23/2015 9 2 4 100 1/24/2015 10 2 5 100 1/25/2015 11 2 6 100 1/26/2015 12 3 1 100 1/27/2015 13 3 2 100 1/28/2015 14 3 3

proc srot data=have;

by id date;

run;

data want;

set have;

by id;

retain grp subgrp;

if first.id then do;

grp=0;subgrp=1;

end;

else do;

grp+1;

subgrp+1;

end;

run;

Thanks

Catthy

Posts: 1,245

## Re: create groups

[ Edited ]

Hi @cathy_sas,

It's not 100% clear from your example and code what's the general logic behind GRP and SUBGRP.

Here's a suggestion that creates the WANT dataset with your sample data.

proc sort data=have;
by id date;
run;

data want;
set have;
by id date;
if lag(4*n(day1)+2*n(day2)+n(day3)) ne 4*n(day1)+2*n(day2)+n(day3) then do;
grp+1;
subgrp=1;
end;
else subgrp+1;
if first.id then do;
grp=1;
subgrp=1;
end;
run;

Edit: Added the PROC SORT step.

Explanations:

The expression 4*n(day1)+2*n(day2)+n(day3) captures the missing-value pattern of (day1, day2, day3): The n() functions return either 0 or 1 depending on whether the (numeric) argument is missing or not. The combination of the three 0/1 results, e.g. "1 0 0" if day1 is non-missing while day2 and day3 are missing, can be interpreted as a number in the binary system. The decimal value of this number, 4 in the example (a value from {0, 1, ..., 7} in general), is calculated by the above expression. Alternatively, one could create a character expression by concatenating the "binary digits":

if lag(cats(n(day1),n(day2),n(day3))) ne cats(n(day1),n(day2),n(day3)) then do;

The general rule I've implemented is: Whenever the missing-value pattern of (day1, day2, day3) changes from the previous to the current observation, GRP is incremented by 1 and SUBGRP is reset to 1. Otherwise SUBGRP is incremented by 1 and GRP is left unchanged. (Please note the implicit RETAIN for both GRP and SUBGRP due to the sum statements grp+1 and subgrp+1, respectively.)

The exception to the general rule is: For the first observation of an ID BY-group GRP and SUBGRP are set to 1. In particular, this happens when the very first observation of dataset HAVE is read.

Please let me know if this is compatible with your intention of what GRP and SUBGRP should be.

Posts: 3,852

## Re: create groups

[ Edited ]

A view, 2 functions, a by statement and a sum statement.

data date;
infile cards firstobs=2 dsd;
input id:\$3. date:mmddyy. day1-day3;
format date mmddyyd.;
cards;
id,date,day1,day2,day3
100,1/15/2015,1, ,
100,1/16/2015,2, ,
100,1/17/2015,3, ,
100,1/18/2015,4, ,
100,1/19/2015,5, ,
100,1/20/2015, ,6,
100,1/21/2015, ,7,
100,1/22/2015, ,8,
100,1/23/2015, ,9,
100,1/24/2015, ,10,
100,1/25/2015, ,11,
100,1/26/2015, , ,12
100,1/27/2015, , ,13
100,1/28/2015, , ,14
;;;;
run;
proc print;
run;
data groupedv / view=groupedv;
set date;
group = whichn(coalesce(of day:),of day:);
run;
data group;
set groupedv;
by group;
if first.group then subgroup=0;
subgroup + 1;
run;
proc print;
run;

Posts: 1,245

## Re: create groups

@data_null__: Very nice. I think I should use views more often.

Minor suggestion: by id group; or by group notsorted; in the second data step (for the general case with more than one value of ID in the data).

Super Contributor
Posts: 271

## Re: create groups

data test;

set date;

array y[3] day1-day3;

if _n_=1 then do;

old_n=1;

group=0;

subgroup=0;

end;

do i = 1 to 3;

if y[i] ne . then

do;

group=i;

if group=old_n then subgroup=subgroup+1;

else do;

old_n=i;

subgroup=1;

end;

output;

end;

end;

retain old_n subgroup;

drop old_n i;

run;

Discussion stats
• 4 replies
• 282 views
• 1 like
• 4 in conversation