## Conducting a fuzzy merge on dates while specifying a minimum difference between dates

Solved
Occasional Contributor
Posts: 7

# Conducting a fuzzy merge on dates while specifying a minimum difference between dates

I have been using the following syntax to conduct a fuzzy merge on dates between two tables:

``````proc sql;
create table want as
select
a.*,
b.start as r_start format=date9.,
b.end   as r_end format=date9.,
b.var1,
b.var2,
b.var3
from
a left join b
on a.id=b.id
group by a.id, a.start
having min((a.start-b.start)) = ((a.start-b.start))
;
quit;``````

As an additional step, I need to specify that the minimum value of (a.start-b.start) is the minimum value that is >= 0. Essentially what I am trying to do is identify the closest date to a.start in table b, but with the added caveat that b.start has to have occurred before or on the same day as a.start

Accepted Solutions
Solution
‎03-29-2018 11:02 AM
Posts: 5,541

## Re: Conducting a fuzzy merge on dates while specifying a minimum difference between dates

[ Edited ]

Note that within a a.start group, the value of a.start is constant. So the condition

min((a.start-b.start)) = ((a.start-b.start))

is equivalent to

min(-b.start) = -b.start

which is the same as

max(b.start) = b.start

you could thus use the query

``````proc sql;
create table want as
select
a.*,
b.start as r_start format=date9.,
b.end   as r_end format=date9.,
b.var1,
b.var2,
b.var3
from
a left join b
on a.id=b.id and a.start >= b.start
group by a.id, a.start
having max(b.start) = b.start
;
quit;``````

PG

All Replies
Posts: 4,737

## Re: Conducting a fuzzy merge on dates while specifying a minimum difference between dates

[ Edited ]

@djohn051

Couldn't you just filter out these cases during the joinl?

`on a.id=b.id and (a.start-b.start) >=0`

Above will also filter cases where b.start is missing because there was no matching record in table with alias B (so actually: you could also go for an inner join which would perform better). If you want to include non matching records in your final result then modify the join condition to:

`on a.id=b.id and (a.start-b.start >0 or b.start is null)`

Super Contributor
Posts: 266

## Re: Conducting a fuzzy merge on dates while specifying a minimum difference between dates

[ Edited ]

EDIT:  I JUST SAW THIS HAS BEEN DISCUSSED AND SOLVED IN ANOTHER THREAD.  DON'T DOUBLE POST UP LIKE THAT.  I'LL LEAVE THIS ANSWER IN CASE IT APPLIES TO SOMEONE ELSE'S PROBLEM.

You could do it less elegantly with a subquery.

You don't post any sample data and not real good guidance but given data like

``````Data aa;
input id one mydate:mmddyy10. something:\$6.;
format mydate yymmdd10.;
datalines;
111 100 6/1/2014  yergle
112 200 7/2/2014  omnipt
113 500 8/4/2014  weezle
;
run;

Data bb;
input id one mydate:mmddyy10. var1:\$3. var2:\$3. var3:\$3.;
format mydate yymmdd10.;
datalines;
111 100 7/1/2014  AAA bbb ccc
111 200 5/30/2014  bbb ccc ddd
111 500 5/15/2014  ddd eee fff
112 100 7/3/2014  AAA xxx sss
112 200 7/4/2014  AAA zzz qqq
112 500 7/5/2014  ABC vvv bbb
113 100 6/1/2014  AAA nnn mmm
113 200 6/2/2014  AAA eee ttt
113 500 6/4/2014  ABC fff ggg
;
run;
``````

if I understand correctly you would want data from BB records with id of 111 and var values of bbb ccc ddd and 113 ABC fff ggg

A subquery could find the closest record and then you can join back.

``````proc sql;
select bb.*, aa.something
from (
select bb.id, max(bb.mydate) as myccdate
from bb
inner join aa on bb.id = aa.id
where bb.mydate <= aa.mydate
group by bb.id
) as cc
left join bb on bb.id = cc.id and bb.mydate = cc.myccdate
left join aa on bb.id = aa.id;
quit;
``````

This produces

``````                                        The SAS System
id       one      mydate  var1  var2  var3  something

111       200  2014-05-30  bbb   ccc   ddd   yergle
113       500  2014-06-04  ABC   fff   ggg   weezle

``````

Maybe this is completely off base.  Maybe it will be useful.

Solution
‎03-29-2018 11:02 AM
Posts: 5,541

## Re: Conducting a fuzzy merge on dates while specifying a minimum difference between dates

[ Edited ]

Note that within a a.start group, the value of a.start is constant. So the condition

min((a.start-b.start)) = ((a.start-b.start))

is equivalent to

min(-b.start) = -b.start

which is the same as

max(b.start) = b.start

you could thus use the query

``````proc sql;
create table want as
select
a.*,
b.start as r_start format=date9.,
b.end   as r_end format=date9.,
b.var1,
b.var2,
b.var3
from
a left join b
on a.id=b.id and a.start >= b.start
group by a.id, a.start
having max(b.start) = b.start
;
quit;``````

PG
Occasional Contributor
Posts: 7

## Re: Conducting a fuzzy merge on dates while specifying a minimum difference between dates

This worked perfectly. I was too focused on trying to integrate everything on the having line that I didn't think to remove all of the cases where time b > time a during the actual join.

Many thanks!

☑ This topic is solved.