BookmarkSubscribeRSS Feed
cathy_sas
Calcite | Level 5

 

Hi all,

 

i want to create grp and subgrp variables

 

 

 

have:

id date day1 day2 day3
100 1/15/2015 1    
100 1/16/2015 2    
100 1/17/2015 3    
100 1/18/2015 4    
100 1/19/2015 5    
100 1/20/2015   6  
100 1/21/2015   7  
100 1/22/2015   8  
100 1/23/2015   9  
100 1/24/2015   10  
100 1/25/2015   11  
100 1/26/2015     12
100 1/27/2015     13
100 1/28/2015     14

 

want:

 

id date day1 day2 day3 grp subgrp
100 1/15/2015 1     1 1
100 1/16/2015 2     1 2
100 1/17/2015 3     1 3
100 1/18/2015 4     1 4
100 1/19/2015 5     1 5
100 1/20/2015   6   2 1
100 1/21/2015   7   2 2
100 1/22/2015   8   2 3
100 1/23/2015   9   2 4
100 1/24/2015   10   2 5
100 1/25/2015   11   2 6
100 1/26/2015     12 3 1
100 1/27/2015     13 3 2
100 1/28/2015     14 3 3

 

proc srot data=have;

  by id date;

run;

data want;

  set have;

 by id;

retain grp subgrp;

if first.id then do;

  grp=0;subgrp=1;

end;

else do;

 grp+1;

subgrp+1;

end;

run;

 

 

Thanks

Catthy

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hi @cathy_sas,

 

It's not 100% clear from your example and code what's the general logic behind GRP and SUBGRP.

 

Here's a suggestion that creates the WANT dataset with your sample data.

 

proc sort data=have;
  by id date;
run;

data want;
set have;
by id date;
if lag(4*n(day1)+2*n(day2)+n(day3)) ne 4*n(day1)+2*n(day2)+n(day3) then do;
  grp+1;
  subgrp=1;
end;
else subgrp+1;
if first.id then do;
  grp=1;
  subgrp=1;
end;
run;

 

 

Edit: Added the PROC SORT step.

 

Explanations:

The expression 4*n(day1)+2*n(day2)+n(day3) captures the missing-value pattern of (day1, day2, day3): The n() functions return either 0 or 1 depending on whether the (numeric) argument is missing or not. The combination of the three 0/1 results, e.g. "1 0 0" if day1 is non-missing while day2 and day3 are missing, can be interpreted as a number in the binary system. The decimal value of this number, 4 in the example (a value from {0, 1, ..., 7} in general), is calculated by the above expression. Alternatively, one could create a character expression by concatenating the "binary digits":

if lag(cats(n(day1),n(day2),n(day3))) ne cats(n(day1),n(day2),n(day3)) then do;

 

The general rule I've implemented is: Whenever the missing-value pattern of (day1, day2, day3) changes from the previous to the current observation, GRP is incremented by 1 and SUBGRP is reset to 1. Otherwise SUBGRP is incremented by 1 and GRP is left unchanged. (Please note the implicit RETAIN for both GRP and SUBGRP due to the sum statements grp+1 and subgrp+1, respectively.)

 

The exception to the general rule is: For the first observation of an ID BY-group GRP and SUBGRP are set to 1. In particular, this happens when the very first observation of dataset HAVE is read.

 

Please let me know if this is compatible with your intention of what GRP and SUBGRP should be.

data_null__
Jade | Level 19

A view, 2 functions, a by statement and a sum statement.

 

data date;
   infile cards firstobs=2 dsd;
   input id:$3. date:mmddyy. day1-day3;
   format date mmddyyd.;
   cards;
id,date,day1,day2,day3
100,1/15/2015,1, , 
100,1/16/2015,2, , 
100,1/17/2015,3, , 
100,1/18/2015,4, , 
100,1/19/2015,5, , 
100,1/20/2015, ,6, 
100,1/21/2015, ,7, 
100,1/22/2015, ,8, 
100,1/23/2015, ,9, 
100,1/24/2015, ,10, 
100,1/25/2015, ,11, 
100,1/26/2015, , ,12
100,1/27/2015, , ,13
100,1/28/2015, , ,14
;;;;
   run;
proc print;
   run;
data groupedv / view=groupedv;
   set date;
   group = whichn(coalesce(of day:),of day:);
   run;
data group;
   set groupedv;
   by group;
   if first.group then subgroup=0;
   subgroup + 1;
   run;
proc print;
   run;

Capture.PNG

FreelanceReinh
Jade | Level 19

@data_null__: Very nice. I think I should use views more often.

Minor suggestion: by id group; or by group notsorted; in the second data step (for the general case with more than one value of ID in the data).

Aman4SAS
Obsidian | Level 7

data test;

set date;

array y[3] day1-day3;

if _n_=1 then do;

                        old_n=1;

                        group=0;

                        subgroup=0;

                     end;

    do i = 1 to 3;

               if y[i] ne . then

                        do;

                              group=i;

                              if group=old_n then subgroup=subgroup+1;

                              else do;

                                    old_n=i;

                                    subgroup=1;

                                    end;

                            output;

                     end;

    end;

   retain old_n subgroup;

   drop old_n i;

  run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 979 views
  • 1 like
  • 4 in conversation