Hi all,
i want to create grp and subgrp variables
have:
id | date | day1 | day2 | day3 |
100 | 1/15/2015 | 1 | ||
100 | 1/16/2015 | 2 | ||
100 | 1/17/2015 | 3 | ||
100 | 1/18/2015 | 4 | ||
100 | 1/19/2015 | 5 | ||
100 | 1/20/2015 | 6 | ||
100 | 1/21/2015 | 7 | ||
100 | 1/22/2015 | 8 | ||
100 | 1/23/2015 | 9 | ||
100 | 1/24/2015 | 10 | ||
100 | 1/25/2015 | 11 | ||
100 | 1/26/2015 | 12 | ||
100 | 1/27/2015 | 13 | ||
100 | 1/28/2015 | 14 |
want:
id | date | day1 | day2 | day3 | grp | subgrp |
100 | 1/15/2015 | 1 | 1 | 1 | ||
100 | 1/16/2015 | 2 | 1 | 2 | ||
100 | 1/17/2015 | 3 | 1 | 3 | ||
100 | 1/18/2015 | 4 | 1 | 4 | ||
100 | 1/19/2015 | 5 | 1 | 5 | ||
100 | 1/20/2015 | 6 | 2 | 1 | ||
100 | 1/21/2015 | 7 | 2 | 2 | ||
100 | 1/22/2015 | 8 | 2 | 3 | ||
100 | 1/23/2015 | 9 | 2 | 4 | ||
100 | 1/24/2015 | 10 | 2 | 5 | ||
100 | 1/25/2015 | 11 | 2 | 6 | ||
100 | 1/26/2015 | 12 | 3 | 1 | ||
100 | 1/27/2015 | 13 | 3 | 2 | ||
100 | 1/28/2015 | 14 | 3 | 3 |
proc srot data=have;
by id date;
run;
data want;
set have;
by id;
retain grp subgrp;
if first.id then do;
grp=0;subgrp=1;
end;
else do;
grp+1;
subgrp+1;
end;
run;
Thanks
Catthy
Hi @cathy_sas,
It's not 100% clear from your example and code what's the general logic behind GRP and SUBGRP.
Here's a suggestion that creates the WANT dataset with your sample data.
proc sort data=have;
by id date;
run;
data want;
set have;
by id date;
if lag(4*n(day1)+2*n(day2)+n(day3)) ne 4*n(day1)+2*n(day2)+n(day3) then do;
grp+1;
subgrp=1;
end;
else subgrp+1;
if first.id then do;
grp=1;
subgrp=1;
end;
run;
Edit: Added the PROC SORT step.
Explanations:
The expression 4*n(day1)+2*n(day2)+n(day3) captures the missing-value pattern of (day1, day2, day3): The n() functions return either 0 or 1 depending on whether the (numeric) argument is missing or not. The combination of the three 0/1 results, e.g. "1 0 0" if day1 is non-missing while day2 and day3 are missing, can be interpreted as a number in the binary system. The decimal value of this number, 4 in the example (a value from {0, 1, ..., 7} in general), is calculated by the above expression. Alternatively, one could create a character expression by concatenating the "binary digits":
if lag(cats(n(day1),n(day2),n(day3))) ne cats(n(day1),n(day2),n(day3)) then do;
The general rule I've implemented is: Whenever the missing-value pattern of (day1, day2, day3) changes from the previous to the current observation, GRP is incremented by 1 and SUBGRP is reset to 1. Otherwise SUBGRP is incremented by 1 and GRP is left unchanged. (Please note the implicit RETAIN for both GRP and SUBGRP due to the sum statements grp+1 and subgrp+1, respectively.)
The exception to the general rule is: For the first observation of an ID BY-group GRP and SUBGRP are set to 1. In particular, this happens when the very first observation of dataset HAVE is read.
Please let me know if this is compatible with your intention of what GRP and SUBGRP should be.
A view, 2 functions, a by statement and a sum statement.
data date;
infile cards firstobs=2 dsd;
input id:$3. date:mmddyy. day1-day3;
format date mmddyyd.;
cards;
id,date,day1,day2,day3
100,1/15/2015,1, ,
100,1/16/2015,2, ,
100,1/17/2015,3, ,
100,1/18/2015,4, ,
100,1/19/2015,5, ,
100,1/20/2015, ,6,
100,1/21/2015, ,7,
100,1/22/2015, ,8,
100,1/23/2015, ,9,
100,1/24/2015, ,10,
100,1/25/2015, ,11,
100,1/26/2015, , ,12
100,1/27/2015, , ,13
100,1/28/2015, , ,14
;;;;
run;
proc print;
run;
data groupedv / view=groupedv;
set date;
group = whichn(coalesce(of day:),of day:);
run;
data group;
set groupedv;
by group;
if first.group then subgroup=0;
subgroup + 1;
run;
proc print;
run;
@data_null__: Very nice. I think I should use views more often.
Minor suggestion: by id group; or by group notsorted; in the second data step (for the general case with more than one value of ID in the data).
data test;
set date;
array y[3] day1-day3;
if _n_=1 then do;
old_n=1;
group=0;
subgroup=0;
end;
do i = 1 to 3;
if y[i] ne . then
do;
group=i;
if group=old_n then subgroup=subgroup+1;
else do;
old_n=i;
subgroup=1;
end;
output;
end;
end;
retain old_n subgroup;
drop old_n i;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.