Hi all,
I have to solve this using proc sql because it will then be converted to a sql script for automation and needs to be as close as possible (just how things work here).
My dataset has duplicate records like below with different dates. I need to join to a second table where b.date > a.date. But i only want to join to the closest match to b.date which is the max (a.date)
Table a | |
BAN | col_date |
123 | 20-Jun-20 |
123 | 21-Jul-20 |
Table b | |
status | status_date |
r | 20-Aug-20 |
to get
Table a | |||
BAN | col_date | status | status_date |
123 | 20-Jun-20 | ||
123 | 21-Jul-20 | r | 20-Aug-20 |
Would appreciate any and all guidance. thank you
@TheNovice wrote:
Apologies, yes, the join key would be ban
According to above I have added BAN to table b.
Next code is tested:
data table_a;
input ban col_date :date9.;
format col_date date9.;
cards;
123 20-jun-20
123 21-jul-20
; run;
data table_b;
input ban status $ status_date :date9.;
format status_date date9.;
cards;
123 r 20-aug-20
; run;
proc sql;
create table want(drop=dif_date min_date) as
select a.ban, a.col_date,
b.status_date - a.col_date as dif_date,
min(calculated dif_date) as min_date,
case when calculated dif_date = calculated min_date
then b.status
else ' ' end as status ,
case when calculated dif_date = calculated min_date
then b.status_date
else . end as status_date format=date9.
from table_a as a
left join table_b as b
on a.ban = b.ban
group by a.ban
;
quit;
There is no BAN in table b?
@TheNovice wrote:
Apologies, yes, the join key would be ban
Then, again: why is the variable BAN not in the second table?
@TheNovice wrote:
Apologies, yes, the join key would be ban
According to above I have added BAN to table b.
Next code is tested:
data table_a;
input ban col_date :date9.;
format col_date date9.;
cards;
123 20-jun-20
123 21-jul-20
; run;
data table_b;
input ban status $ status_date :date9.;
format status_date date9.;
cards;
123 r 20-aug-20
; run;
proc sql;
create table want(drop=dif_date min_date) as
select a.ban, a.col_date,
b.status_date - a.col_date as dif_date,
min(calculated dif_date) as min_date,
case when calculated dif_date = calculated min_date
then b.status
else ' ' end as status ,
case when calculated dif_date = calculated min_date
then b.status_date
else . end as status_date format=date9.
from table_a as a
left join table_b as b
on a.ban = b.ban
group by a.ban
;
quit;
data table_a;
input ban col_date :date9.;
format col_date date9.;
cards;
123 20-jun-20
123 21-jul-20
; run;
data table_b;
input ban status $ status_date :date9.;
format status_date date9.;
cards;
123 r 20-aug-20
; run;
proc sql;
create table want as
select a.ban, a.col_date,
case when abs(a.col_date-b.status_date)=
min(abs(a.col_date-b.status_date)) then b.status
else ' ' end as status
,
case when abs(a.col_date-b.status_date)=
min(abs(a.col_date-b.status_date)) then b.status_date
else . end as status_date format=date9.
from table_a as a
left join table_b as b
on a.ban = b.ban ;
quit;
I think @Ksharp 's answer doesn't quite answer the question. This might do better:
data a;
input ban col_date :date9.;
format col_date date9.;
cards;
123 20-jun-20
123 21-jul-20
567 20-jun-20
567 21-jul-20
;
data b;
input ban status $ status_date :date9.;
format status_date date9.;
cards;
123 r 20-aug-20
456 r 20-aug-20
;
proc sql;
select a.ban, col_date,
case when col_date = max(col_date) then status
else ' ' end as status
,
case when col_date = max(col_date) then status_date
else . end as status_date format=date9.
from a
left join b
on a.ban = b.ban and status_date > col_date
group by a.ban;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.