I have been using the following syntax to conduct a fuzzy merge on dates between two tables:
proc sql;
create table want as
select
a.*,
b.start as r_start format=date9.,
b.end as r_end format=date9.,
b.var1,
b.var2,
b.var3
from
a left join b
on a.id=b.id
group by a.id, a.start
having min((a.start-b.start)) = ((a.start-b.start))
;
quit;
As an additional step, I need to specify that the minimum value of (a.start-b.start) is the minimum value that is >= 0. Essentially what I am trying to do is identify the closest date to a.start in table b, but with the added caveat that b.start has to have occurred before or on the same day as a.start
Note that within a a.start group, the value of a.start is constant. So the condition
min((a.start-b.start)) = ((a.start-b.start))
is equivalent to
min(-b.start) = -b.start
which is the same as
max(b.start) = b.start
you could thus use the query
proc sql;
create table want as
select
a.*,
b.start as r_start format=date9.,
b.end as r_end format=date9.,
b.var1,
b.var2,
b.var3
from
a left join b
on a.id=b.id and a.start >= b.start
group by a.id, a.start
having max(b.start) = b.start
;
quit;
Couldn't you just filter out these cases during the joinl?
on a.id=b.id and (a.start-b.start) >=0
Above will also filter cases where b.start is missing because there was no matching record in table with alias B (so actually: you could also go for an inner join which would perform better). If you want to include non matching records in your final result then modify the join condition to:
on a.id=b.id and (a.start-b.start >0 or b.start is null)
EDIT: I JUST SAW THIS HAS BEEN DISCUSSED AND SOLVED IN ANOTHER THREAD. DON'T DOUBLE POST UP LIKE THAT. I'LL LEAVE THIS ANSWER IN CASE IT APPLIES TO SOMEONE ELSE'S PROBLEM.
You could do it less elegantly with a subquery.
You don't post any sample data and not real good guidance but given data like
Data aa;
input id one mydate:mmddyy10. something:$6.;
format mydate yymmdd10.;
datalines;
111 100 6/1/2014 yergle
112 200 7/2/2014 omnipt
113 500 8/4/2014 weezle
;
run;
Data bb;
input id one mydate:mmddyy10. var1:$3. var2:$3. var3:$3.;
format mydate yymmdd10.;
datalines;
111 100 7/1/2014 AAA bbb ccc
111 200 5/30/2014 bbb ccc ddd
111 500 5/15/2014 ddd eee fff
112 100 7/3/2014 AAA xxx sss
112 200 7/4/2014 AAA zzz qqq
112 500 7/5/2014 ABC vvv bbb
113 100 6/1/2014 AAA nnn mmm
113 200 6/2/2014 AAA eee ttt
113 500 6/4/2014 ABC fff ggg
;
run;
if I understand correctly you would want data from BB records with id of 111 and var values of bbb ccc ddd and 113 ABC fff ggg
A subquery could find the closest record and then you can join back.
proc sql;
select bb.*, aa.something
from (
select bb.id, max(bb.mydate) as myccdate
from bb
inner join aa on bb.id = aa.id
where bb.mydate <= aa.mydate
group by bb.id
) as cc
left join bb on bb.id = cc.id and bb.mydate = cc.myccdate
left join aa on bb.id = aa.id;
quit;
This produces
The SAS System
id one mydate var1 var2 var3 something
111 200 2014-05-30 bbb ccc ddd yergle
113 500 2014-06-04 ABC fff ggg weezle
Maybe this is completely off base. Maybe it will be useful.
Note that within a a.start group, the value of a.start is constant. So the condition
min((a.start-b.start)) = ((a.start-b.start))
is equivalent to
min(-b.start) = -b.start
which is the same as
max(b.start) = b.start
you could thus use the query
proc sql;
create table want as
select
a.*,
b.start as r_start format=date9.,
b.end as r_end format=date9.,
b.var1,
b.var2,
b.var3
from
a left join b
on a.id=b.id and a.start >= b.start
group by a.id, a.start
having max(b.start) = b.start
;
quit;
This worked perfectly. I was too focused on trying to integrate everything on the having line that I didn't think to remove all of the cases where time b > time a during the actual join.
Many thanks!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.