Hello,
I have a data set that looks like the following:
ID | DATE | VAR1 | VAR2 | VAR3 |
1 | 8-22-2022 | a | ||
1 | 8-22-2022 | b | ||
1 | 7-23-2021 | c | ||
2 | 11-12-2019 | a | ||
2 | 10-14-2019 | c |
data have;
input ID DATE :mmddyy10. VAR1 $ VAR2 $ VAR3 $ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a . .
1 08-22-2022 . b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;
I want something that looks like the following:
ID | DATE | VAR1 | VAR2 | VAR3 |
1 | 8-22-2022 | a | b | |
1 | 8-22-2022 | a | b | |
1 | 7-23-2021 | c | ||
2 | 11-12-2019 | a | ||
2 | 10-14-2019 | c |
data want;
input ID DATE :mmddyy10. VAR1 $ VAR2 $ VAR3 $ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a b .
1 08-22-2022 a b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;
Essentially, I need a data set where the values for var1 - var3 are retained if the id AND the date are the same. I tried something like this; however, it did not work (also other similar iterations of the code below (with first.id and first.date).
data test;
set have;
by id date;
retain met_ihc2ln_;
run;
Thank you in advance for any help you can provide!
There are probably more elegant ways of doing this but SQL works for the data you've supplied:
proc sql;
create table want as
select A.ID
,A.DATE
,coalesce(A.VAR1, B.Max_VAR1) as VAR1
,coalesce(A.VAR2, B.Max_VAR2) as VAR2
,coalesce(A.VAR3, B.Max_VAR3) as VAR3
from have as A
left join
(select ID
,DATE
,max(VAR1) as Max_VAR1
,max(VAR2) as Max_VAR2
,max(VAR3) as Max_VAR3
from have
group by ID
,DATE
) as B
on A.ID = B.ID
and A.DATE = B.DATE
order by A.ID
,A.DATE
;
quit;
Since your date is missing on the first example record how does that meet the requirement "the id AND the date are the same"? So you need to describe what is going on when a date obviously doesn't match that it should be processed this way.
Retain won't work directly because it can keep a value from a PREVIOUS record, not look ahead to the NEXT record as you are doing in a couple of places.
I see what you are saying. I had made a mistake with the date being missing in my first chart, no dates should be missing. Do you know of a way that I could look to the NEXT record across date and id? I am still stuck with how to get this.
There are probably more elegant ways of doing this but SQL works for the data you've supplied:
proc sql;
create table want as
select A.ID
,A.DATE
,coalesce(A.VAR1, B.Max_VAR1) as VAR1
,coalesce(A.VAR2, B.Max_VAR2) as VAR2
,coalesce(A.VAR3, B.Max_VAR3) as VAR3
from have as A
left join
(select ID
,DATE
,max(VAR1) as Max_VAR1
,max(VAR2) as Max_VAR2
,max(VAR3) as Max_VAR3
from have
group by ID
,DATE
) as B
on A.ID = B.ID
and A.DATE = B.DATE
order by A.ID
,A.DATE
;
quit;
Thank you so much! This worked great! I appreciate all of your help!
Just another solution:
data want;
length v1-v3 $ 1;
do _n_ = 1 by 1 until(last.date);
set have;
by id date notsorted;
v1 = coalescec(v1, var1);
v2 = coalescec(v2, var2);
v3 = coalescec(v3, var3);
end;
do _n_ = 1 by 1 until(last.date);
set have;
by id date notsorted;
var1 = coalescec(v1, var1);
var2 = coalescec(v2, var2);
var3 = coalescec(v3, var3);
end;
drop v1-v3;
run;
Thank you so much for your help! I have never seen the coalesce function used like this, and I appreciate it!
data have;
input ID DATE :mmddyy10. VAR1 $ VAR2 $ VAR3 $ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a . .
1 08-22-2022 . b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;
proc sql;
create table want(drop=dummy) as
select ID,DATE,var1 as dummy,max(var1) as var1,max(var2) as var2,max(var3) as var3
from have
group by ID,DATE ;
quit;
Thank you so much for this solution! I appreciate your help!
Your example data is not complex enough to actually test all possible scenarios. In particular, what do you want to do when there is more than one non-missing value for a variable within a group?
Here is method using the UPDATE statement. First collapse to final values per group by using the existing data as transactions against an empty dataset. Then reapply the transactions to get the detailed observations written out.
data have ;
input ID DATE :mmddyy. (VAR1-VAR3) (:$8.);
format date yymmdd10.;
cards;
1 8-22-2022 a . .
1 8-22-2022 . b .
1 7-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
data step1;
update have(obs=0) have ;
by id descending date;
run;
data want ;
update step1 have;
by id descending date;
output;
run;
Result:
Obs ID DATE VAR1 VAR2 VAR3 1 1 2022-08-22 a b 2 1 2022-08-22 a b 3 1 2021-07-23 c 4 2 2019-11-12 a 5 2 2019-10-14 c
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.