DATA Step, Macro, Functions and more

create groups

Reply
Contributor
Posts: 21

create groups

 

Hi all,

 

i want to create grp and subgrp variables

 

 

 

have:

id date day1 day2 day3
100 1/15/2015 1    
100 1/16/2015 2    
100 1/17/2015 3    
100 1/18/2015 4    
100 1/19/2015 5    
100 1/20/2015   6  
100 1/21/2015   7  
100 1/22/2015   8  
100 1/23/2015   9  
100 1/24/2015   10  
100 1/25/2015   11  
100 1/26/2015     12
100 1/27/2015     13
100 1/28/2015     14

 

want:

 

id date day1 day2 day3 grp subgrp
100 1/15/2015 1     1 1
100 1/16/2015 2     1 2
100 1/17/2015 3     1 3
100 1/18/2015 4     1 4
100 1/19/2015 5     1 5
100 1/20/2015   6   2 1
100 1/21/2015   7   2 2
100 1/22/2015   8   2 3
100 1/23/2015   9   2 4
100 1/24/2015   10   2 5
100 1/25/2015   11   2 6
100 1/26/2015     12 3 1
100 1/27/2015     13 3 2
100 1/28/2015     14 3 3

 

proc srot data=have;

  by id date;

run;

data want;

  set have;

 by id;

retain grp subgrp;

if first.id then do;

  grp=0;subgrp=1;

end;

else do;

 grp+1;

subgrp+1;

end;

run;

 

 

Thanks

Catthy

Trusted Advisor
Posts: 1,117

Re: create groups

[ Edited ]
Posted in reply to cathy_sas

Hi @cathy_sas,

 

It's not 100% clear from your example and code what's the general logic behind GRP and SUBGRP.

 

Here's a suggestion that creates the WANT dataset with your sample data.

 

proc sort data=have;
  by id date;
run;

data want;
set have;
by id date;
if lag(4*n(day1)+2*n(day2)+n(day3)) ne 4*n(day1)+2*n(day2)+n(day3) then do;
  grp+1;
  subgrp=1;
end;
else subgrp+1;
if first.id then do;
  grp=1;
  subgrp=1;
end;
run;

 

 

Edit: Added the PROC SORT step.

 

Explanations:

The expression 4*n(day1)+2*n(day2)+n(day3) captures the missing-value pattern of (day1, day2, day3): The n() functions return either 0 or 1 depending on whether the (numeric) argument is missing or not. The combination of the three 0/1 results, e.g. "1 0 0" if day1 is non-missing while day2 and day3 are missing, can be interpreted as a number in the binary system. The decimal value of this number, 4 in the example (a value from {0, 1, ..., 7} in general), is calculated by the above expression. Alternatively, one could create a character expression by concatenating the "binary digits":

if lag(cats(n(day1),n(day2),n(day3))) ne cats(n(day1),n(day2),n(day3)) then do;

 

The general rule I've implemented is: Whenever the missing-value pattern of (day1, day2, day3) changes from the previous to the current observation, GRP is incremented by 1 and SUBGRP is reset to 1. Otherwise SUBGRP is incremented by 1 and GRP is left unchanged. (Please note the implicit RETAIN for both GRP and SUBGRP due to the sum statements grp+1 and subgrp+1, respectively.)

 

The exception to the general rule is: For the first observation of an ID BY-group GRP and SUBGRP are set to 1. In particular, this happens when the very first observation of dataset HAVE is read.

 

Please let me know if this is compatible with your intention of what GRP and SUBGRP should be.

Respected Advisor
Posts: 3,799

Re: create groups

[ Edited ]
Posted in reply to cathy_sas

A view, 2 functions, a by statement and a sum statement.

 

data date;
   infile cards firstobs=2 dsd;
   input id:$3. date:mmddyy. day1-day3;
   format date mmddyyd.;
   cards;
id,date,day1,day2,day3
100,1/15/2015,1, , 
100,1/16/2015,2, , 
100,1/17/2015,3, , 
100,1/18/2015,4, , 
100,1/19/2015,5, , 
100,1/20/2015, ,6, 
100,1/21/2015, ,7, 
100,1/22/2015, ,8, 
100,1/23/2015, ,9, 
100,1/24/2015, ,10, 
100,1/25/2015, ,11, 
100,1/26/2015, , ,12
100,1/27/2015, , ,13
100,1/28/2015, , ,14
;;;;
   run;
proc print;
   run;
data groupedv / view=groupedv;
   set date;
   group = whichn(coalesce(of day:),of day:);
   run;
data group;
   set groupedv;
   by group;
   if first.group then subgroup=0;
   subgroup + 1;
   run;
proc print;
   run;

Capture.PNG

Trusted Advisor
Posts: 1,117

Re: create groups

Posted in reply to data_null__

@data_null__: Very nice. I think I should use views more often.

Minor suggestion: by id group; or by group notsorted; in the second data step (for the general case with more than one value of ID in the data).

Super Contributor
Posts: 266

Re: create groups

Posted in reply to cathy_sas

data test;

set date;

array y[3] day1-day3;

if _n_=1 then do;

                        old_n=1;

                        group=0;

                        subgroup=0;

                     end;

    do i = 1 to 3;

               if y[i] ne . then

                        do;

                              group=i;

                              if group=old_n then subgroup=subgroup+1;

                              else do;

                                    old_n=i;

                                    subgroup=1;

                                    end;

                            output;

                     end;

    end;

   retain old_n subgroup;

   drop old_n i;

  run;

Ask a Question
Discussion stats
  • 4 replies
  • 255 views
  • 1 like
  • 4 in conversation