Solved: Re: Retain by two variables

awardell · Posted 08-18-2022 05:13 PM

Hello,
I have a data set that looks like the following:

ID	DATE	VAR1	VAR2	VAR3
1	8-22-2022	a
1	8-22-2022		b
1	7-23-2021			c
2	11-12-2019	a
2	10-14-2019			c


data have;
input ID DATE :mmddyy10. VAR1 $ VAR2 $ VAR3 $ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a . .
1 08-22-2022 . b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;

I want something that looks like the following:

ID	DATE	VAR1	VAR2	VAR3
1	8-22-2022	a	b
1	8-22-2022	a	b
1	7-23-2021			c
2	11-12-2019	a
2	10-14-2019			c

data want;
input ID DATE :mmddyy10. VAR1 $ VAR2 $ VAR3 $ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a b .
1 08-22-2022 a b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;

Essentially, I need a data set where the values for var1 - var3 are retained if the id AND the date are the same. I tried something like this; however, it did not work (also other similar iterations of the code below (with first.id and first.date).


data test;
set have;
by id date;
retain met_ihc2ln_;
run;

Thank you in advance for any help you can provide!

SASKiwi · Posted 08-18-2022 10:17 PM

There are probably more elegant ways of doing this but SQL works for the data you've supplied:

proc sql;
  create table want as

  select A.ID
        ,A.DATE
        ,coalesce(A.VAR1, B.Max_VAR1) as VAR1 
        ,coalesce(A.VAR2, B.Max_VAR2) as VAR2
        ,coalesce(A.VAR3, B.Max_VAR3) as VAR3
  from have as A

  left join
  (select ID
          ,DATE
          ,max(VAR1) as Max_VAR1
          ,max(VAR2) as Max_VAR2
          ,max(VAR3) as Max_VAR3
   from have
   group by ID
           ,DATE
  ) as B
  on A.ID = B.ID
  and A.DATE = B.DATE
  order by A.ID
          ,A.DATE 
  ;
quit;

View solution in original post

ballardw · Posted 08-18-2022 05:41 PM

Since your date is missing on the first example record how does that meet the requirement "the id AND the date are the same"? So you need to describe what is going on when a date obviously doesn't match that it should be processed this way.

Retain won't work directly because it can keep a value from a PREVIOUS record, not look ahead to the NEXT record as you are doing in a couple of places.

awardell · Posted 08-18-2022 06:41 PM

I see what you are saying. I had made a mistake with the date being missing in my first chart, no dates should be missing. Do you know of a way that I could look to the NEXT record across date and id? I am still stuck with how to get this.

SASKiwi · Posted 08-18-2022 10:17 PM

There are probably more elegant ways of doing this but SQL works for the data you've supplied:

proc sql;
  create table want as

  select A.ID
        ,A.DATE
        ,coalesce(A.VAR1, B.Max_VAR1) as VAR1 
        ,coalesce(A.VAR2, B.Max_VAR2) as VAR2
        ,coalesce(A.VAR3, B.Max_VAR3) as VAR3
  from have as A

  left join
  (select ID
          ,DATE
          ,max(VAR1) as Max_VAR1
          ,max(VAR2) as Max_VAR2
          ,max(VAR3) as Max_VAR3
   from have
   group by ID
           ,DATE
  ) as B
  on A.ID = B.ID
  and A.DATE = B.DATE
  order by A.ID
          ,A.DATE 
  ;
quit;

awardell · Posted 08-19-2022 02:17 PM

Thank you so much! This worked great! I appreciate all of your help!

andreas_lds · Posted 08-19-2022 01:10 AM

Just another solution:

data want;
   length v1-v3 $ 1;
   
   do _n_ = 1 by 1 until(last.date);
      set have;
      by id date notsorted;
      v1 = coalescec(v1, var1);
      v2 = coalescec(v2, var2);
      v3 = coalescec(v3, var3);
   end;
   
   do _n_ = 1 by 1 until(last.date);
      set have;
      by id date notsorted;
      var1 = coalescec(v1, var1);
      var2 = coalescec(v2, var2);
      var3 = coalescec(v3, var3);
   end;
   
   drop v1-v3;
run;

awardell · Posted 08-19-2022 02:18 PM

Thank you so much for your help! I have never seen the coalesce function used like this, and I appreciate it!

Ksharp · Posted 08-19-2022 08:41 AM

data have;
input ID DATE :mmddyy10. VAR1 $ VAR2 $ VAR3 $ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a . .
1 08-22-2022 . b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;


proc sql;
create table want(drop=dummy) as
select ID,DATE,var1 as dummy,max(var1) as var1,max(var2) as var2,max(var3) as var3
 from have 
  group by ID,DATE ;
quit;

awardell · Posted 08-19-2022 02:21 PM

Thank you so much for this solution! I appreciate your help!

Tom · Posted 08-19-2022 09:33 AM

Your example data is not complex enough to actually test all possible scenarios. In particular, what do you want to do when there is more than one non-missing value for a variable within a group?

Here is method using the UPDATE statement. First collapse to final values per group by using the existing data as transactions against an empty dataset. Then reapply the transactions to get the detailed observations written out.

data have ;
  input ID DATE :mmddyy. (VAR1-VAR3) (:$8.);
  format date yymmdd10.;
cards;
1 8-22-2022 a . .    
1 8-22-2022 . b . 
1 7-23-2021 . . c
2 11-12-2019 a . .  
2 10-14-2019 . . c
;

data step1;
  update have(obs=0) have ;
  by id descending date;
run;

data want ;
  update step1 have;
  by id descending date;
  output;
run;

Result:

Obs    ID          DATE    VAR1    VAR2    VAR3

 1      1    2022-08-22     a       b
 2      1    2022-08-22     a       b
 3      1    2021-07-23                     c
 4      2    2019-11-12     a
 5      2    2019-10-14                     c

Registration is open

SAS Training: Just a Click Away