Hi SAS community,
I have somewhat complex database where I'm trying to link Database A to B where A is my primary database.
I'm trying to find the last date from B [B_date] that is associated with date from A. [A_date]
For example,
Patient A has consecutive visits (Jan 1st, Jan 2nd, Jan 3rd and so on) and I want to merge with Database B and show the last B_date. (i.e. minimum # of difference between A_date and B_date). Since there are multiple visits in B, it's showing different B_date for each of the A_date. I'm getting results like below,
Patient A | A_date1 | Last_B_Date1
Patient A | A_date2 | Last_B_Date2
Patient A | A_date3 | Last_B_Date3
Patient A | A_date4 | Last_B_Date4
I want the results to show like
Patient A | A_date1 | Last_B_Date1
Patient A | A_date2 | Last_B_Date1
Patient A | A_date3 | Last_B_Date1
Patient A | A_date4 | Last_B_Date1
Hope this makes sense.
How are you doing your merge?
Is there an order to the date, or something to identify the first record beyond being first, ie is it the smallest, largest?
If via SQL you could consider using a function such as Min() to obtain the value.
If via data step a BY processing group could identify the first record and retain the date.
Are you working for some CRO company and do some SDTM job ?
Post some dummy data and the output you need , that would be better to let others know what you want.
Well, on your logic alone something like the below should work. However it is a good idea to post test data - in the form of a datastep - and required output for a more complete answer:
proc sql; create table WANT as select MAIN_DATASET.*, (select max(DATE) from OTHER_DATASET where MAIN_DATASET.<idvariables> = <idvariables) and DATE <= MAIN_DATASET.DATE) as OTHER_DATE from MAIN_DATASET MAIN_DATASET; quit;
If you have lots of records, then its probably better to join the select max() on rather than sub-querying like this.
Here's a version that is easy when the date is all you are concerned about:
data want;
set B (in=inB keep=patient date) A (in=inA) ;
by patient date;
if first.patient then last_B_date=.;
if inB then last_B_date = date;
retain last_B_date;
if inA;
run;
If there are more variables coming in from the B data set, the approach still works but it becomes more cumbersome. There are more variables to copy over and retain.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.