☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

Retain by two variables

Hello,
I have a data set that looks like the following:

 ID DATE VAR1 VAR2 VAR3 1 8-22-2022 a 1 8-22-2022 b 1 7-23-2021 c 2 11-12-2019 a 2 10-14-2019 c

``````
data have;
input ID DATE :mmddyy10. VAR1 \$ VAR2 \$ VAR3 \$ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a . .
1 08-22-2022 . b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;``````

I want something that looks like the following:

 ID DATE VAR1 VAR2 VAR3 1 8-22-2022 a b 1 8-22-2022 a b 1 7-23-2021 c 2 11-12-2019 a 2 10-14-2019 c

``````data want;
input ID DATE :mmddyy10. VAR1 \$ VAR2 \$ VAR3 \$ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a b .
1 08-22-2022 a b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;``````

Essentially, I need a data set where the values for var1 - var3 are retained if the id AND the date are the same. I tried something like this; however, it did not work (also other similar iterations of the code below (with first.id and first.date).

``````
data test;
set have;
by id date;
retain met_ihc2ln_;
run;``````

Thank you in advance for any help you can provide!

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

Re: Retain by two variables

There are probably more elegant ways of doing this but SQL works for the data you've supplied:

``````proc sql;
create table want as

select A.ID
,A.DATE
,coalesce(A.VAR1, B.Max_VAR1) as VAR1
,coalesce(A.VAR2, B.Max_VAR2) as VAR2
,coalesce(A.VAR3, B.Max_VAR3) as VAR3
from have as A

left join
(select ID
,DATE
,max(VAR1) as Max_VAR1
,max(VAR2) as Max_VAR2
,max(VAR3) as Max_VAR3
from have
group by ID
,DATE
) as B
on A.ID = B.ID
and A.DATE = B.DATE
order by A.ID
,A.DATE
;
quit;``````
9 REPLIES 9
Super User

Re: Retain by two variables

Since your date is missing on the first example record how does that meet the requirement "the id AND the date are the same"? So you need to describe what is going on when a date obviously doesn't match that it should be processed this way.

Retain won't work directly because it can keep a value from a PREVIOUS record, not look ahead to the NEXT record as you are doing in a couple of places.

Obsidian | Level 7

Re: Retain by two variables

I see what you are saying. I had made a mistake with the date being missing in my first chart, no dates should be missing. Do you know of a way that I could look to the NEXT record across date and id? I am still stuck with how to get this.

PROC Star

Re: Retain by two variables

There are probably more elegant ways of doing this but SQL works for the data you've supplied:

``````proc sql;
create table want as

select A.ID
,A.DATE
,coalesce(A.VAR1, B.Max_VAR1) as VAR1
,coalesce(A.VAR2, B.Max_VAR2) as VAR2
,coalesce(A.VAR3, B.Max_VAR3) as VAR3
from have as A

left join
(select ID
,DATE
,max(VAR1) as Max_VAR1
,max(VAR2) as Max_VAR2
,max(VAR3) as Max_VAR3
from have
group by ID
,DATE
) as B
on A.ID = B.ID
and A.DATE = B.DATE
order by A.ID
,A.DATE
;
quit;``````
Obsidian | Level 7

Re: Retain by two variables

Thank you so much! This worked great! I appreciate all of your help!

Jade | Level 19

Re: Retain by two variables

Just another solution:

``````data want;
length v1-v3 \$ 1;

do _n_ = 1 by 1 until(last.date);
set have;
by id date notsorted;
v1 = coalescec(v1, var1);
v2 = coalescec(v2, var2);
v3 = coalescec(v3, var3);
end;

do _n_ = 1 by 1 until(last.date);
set have;
by id date notsorted;
var1 = coalescec(v1, var1);
var2 = coalescec(v2, var2);
var3 = coalescec(v3, var3);
end;

drop v1-v3;
run;``````
Obsidian | Level 7

Re: Retain by two variables

Thank you so much for your help! I have never seen the coalesce function used like this, and I appreciate it!

Super User

Re: Retain by two variables

``````data have;
input ID DATE :mmddyy10. VAR1 \$ VAR2 \$ VAR3 \$ ;
format DATE mmddyy10.;
datalines;
1 08-22-2022 a . .
1 08-22-2022 . b .
1 07-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;
run;

proc sql;
create table want(drop=dummy) as
select ID,DATE,var1 as dummy,max(var1) as var1,max(var2) as var2,max(var3) as var3
from have
group by ID,DATE ;
quit;``````
Obsidian | Level 7

Re: Retain by two variables

Thank you so much for this solution! I appreciate your help!

Super User

Re: Retain by two variables

Your example data is not complex enough to actually test all possible scenarios.  In particular, what do you want to do when there is more than one non-missing value for a variable within a group?

Here is method using the UPDATE statement.  First collapse to final values per group by using the existing data as transactions against an empty dataset.  Then reapply the transactions to get the detailed observations written out.

``````data have ;
input ID DATE :mmddyy. (VAR1-VAR3) (:\$8.);
format date yymmdd10.;
cards;
1 8-22-2022 a . .
1 8-22-2022 . b .
1 7-23-2021 . . c
2 11-12-2019 a . .
2 10-14-2019 . . c
;

data step1;
update have(obs=0) have ;
by id descending date;
run;

data want ;
update step1 have;
by id descending date;
output;
run;
``````

Result:

```Obs    ID          DATE    VAR1    VAR2    VAR3

1      1    2022-08-22     a       b
2      1    2022-08-22     a       b
3      1    2021-07-23                     c
4      2    2019-11-12     a
5      2    2019-10-14                     c

```

Discussion stats
• 9 replies
• 947 views
• 2 likes
• 6 in conversation