BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
djohn051
Fluorite | Level 6

I have been using the following syntax to conduct a fuzzy merge on dates between two tables:

 

proc sql;
  create table want as
    select 
      a.*,
      b.start as r_start format=date9.,
      b.end   as r_end format=date9.,
      b.var1,
      b.var2,
      b.var3
    from 
      a left join b
        on a.id=b.id
      group by a.id, a.start
      having min((a.start-b.start)) = ((a.start-b.start))
    ;
quit;

As an additional step, I need to specify that the minimum value of (a.start-b.start) is the minimum value that is >= 0. Essentially what I am trying to do is identify the closest date to a.start in table b, but with the added caveat that b.start has to have occurred before or on the same day as a.start

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Note that within a a.start group, the value of a.start is constant. So the condition

 

min((a.start-b.start)) = ((a.start-b.start))

 

is equivalent to

 

min(-b.start) = -b.start

 

which is the same as

 

max(b.start) = b.start

 

you could thus use the query

 

proc sql;
  create table want as
    select 
      a.*,
      b.start as r_start format=date9.,
      b.end   as r_end format=date9.,
      b.var1,
      b.var2,
      b.var3
    from 
      a left join b
        on a.id=b.id and a.start >= b.start
      group by a.id, a.start
      having max(b.start) = b.start
    ;
quit;

 

PG

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

@djohn051

Couldn't you just filter out these cases during the joinl?

on a.id=b.id and (a.start-b.start) >=0

Above will also filter cases where b.start is missing because there was no matching record in table with alias B (so actually: you could also go for an inner join which would perform better). If you want to include non matching records in your final result then modify the join condition to:

on a.id=b.id and (a.start-b.start >0 or b.start is null)

 

 

HB
Barite | Level 11 HB
Barite | Level 11

EDIT:  I JUST SAW THIS HAS BEEN DISCUSSED AND SOLVED IN ANOTHER THREAD.  DON'T DOUBLE POST UP LIKE THAT.  I'LL LEAVE THIS ANSWER IN CASE IT APPLIES TO SOMEONE ELSE'S PROBLEM.

 

 

 

You could do it less elegantly with a subquery.

 

You don't post any sample data and not real good guidance but given data like

 

 

Data aa;
input id one mydate:mmddyy10. something:$6.;
format mydate yymmdd10.;
datalines;
111 100 6/1/2014  yergle
112 200 7/2/2014  omnipt
113 500 8/4/2014  weezle
;
run;


Data bb;
input id one mydate:mmddyy10. var1:$3. var2:$3. var3:$3.;
format mydate yymmdd10.;
datalines;
111 100 7/1/2014  AAA bbb ccc
111 200 5/30/2014  bbb ccc ddd
111 500 5/15/2014  ddd eee fff
112 100 7/3/2014  AAA xxx sss
112 200 7/4/2014  AAA zzz qqq
112 500 7/5/2014  ABC vvv bbb
113 100 6/1/2014  AAA nnn mmm
113 200 6/2/2014  AAA eee ttt
113 500 6/4/2014  ABC fff ggg
;
run;

if I understand correctly you would want data from BB records with id of 111 and var values of bbb ccc ddd and 113 ABC fff ggg

 

A subquery could find the closest record and then you can join back.

 

proc sql;
	select bb.*, aa.something
	from (
		select bb.id, max(bb.mydate) as myccdate
		from bb 
		inner join aa on bb.id = aa.id
		where bb.mydate <= aa.mydate
		group by bb.id
		) as cc
	left join bb on bb.id = cc.id and bb.mydate = cc.myccdate
	left join aa on bb.id = aa.id;
quit;

  This produces

 

 

                                        The SAS System     
                         id       one      mydate  var1  var2  var3  something
                   
                        111       200  2014-05-30  bbb   ccc   ddd   yergle
                        113       500  2014-06-04  ABC   fff   ggg   weezle


 

 

 

Maybe this is completely off base.  Maybe it will be useful.   

PGStats
Opal | Level 21

Note that within a a.start group, the value of a.start is constant. So the condition

 

min((a.start-b.start)) = ((a.start-b.start))

 

is equivalent to

 

min(-b.start) = -b.start

 

which is the same as

 

max(b.start) = b.start

 

you could thus use the query

 

proc sql;
  create table want as
    select 
      a.*,
      b.start as r_start format=date9.,
      b.end   as r_end format=date9.,
      b.var1,
      b.var2,
      b.var3
    from 
      a left join b
        on a.id=b.id and a.start >= b.start
      group by a.id, a.start
      having max(b.start) = b.start
    ;
quit;

 

PG
djohn051
Fluorite | Level 6

This worked perfectly. I was too focused on trying to integrate everything on the having line that I didn't think to remove all of the cases where time b > time a during the actual join. 

 

Many thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1961 views
  • 0 likes
  • 4 in conversation