I have one table that shows company announcements. These could be any date at random.
I have another table that shows many different financials about the company from specific dates.
Is there a way to add a clause to the join where it will join the first table to the second by joining on the next closest date in the 2nd table?
For example, lets assume my announcement date was 5/5/2020.
Lets assume that in the next table I had dates of 3/3/2020, 4/27/2020, 5/27/2020, and 6/4/2020.
I would like to join these tables on the company name (which is both tables), but I want the join to keep the data from the 5/5/2020 announcement and all the data from the 5/27/2020 record (since its the next date).
Thank you!
With SQL only you would first join the two tables over company and then filter the intermediary result to pick the row where the difference between the two dates is minimal.
First join
select t1.cust_id, t1.date, t2.date
join table1 t1 left join table2 t2
on t1.cust_id=t2.cust_id ...may be here also something like: and t2.date between t1.date-20 and t1.date+20 to not select all matching rows from t2
Then filter
group by t1.cust_id
having abs(t1.date-t2.date)=min(abs(t1.date-t2.date))
....now you could still end-up with two rows for you customer if there are two rows in t2 where you have dates which are exactly the same amount of days before or after the t1 date. So you will have to extend the filter to pick either the t2.date before or after the t1.date for such a case.
proc sql;
create table want (drop=dif) as
  select
    a.company,
    a.somedate,
    b.otherdate,
    abs(a.somedate - b.otherdate) as dif
  from have1 a, have2 b
  where a.company = b.company
  group by a.company, a.somedate
  having dif = min(dif)
;
quit;
(posted from my tablet, so untested)
Kurt,
This gets me very very close to where I need to be.
The issue I am running into is that in the second file there appears to be duplicate dates, so each of the items in my original table ends up matching to the date in the 2nd table 4-5 times each.
Do you suggest that I just take the resulting table from your code and then just remove duplicates on the resulting file?
If you only need one observation per company/somedate/otherdate, add the DISTINCT keyword to the SELECT.
Is there any rule involved for if two of the "other" dates have the same difference from the base date?
Or if 3 or more records? ( possible if non-unique values of company identification or matching characteristic in the second set for some reason)
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
