I am trying to use PROC SQL left join to merge two databases.
The first database (master) is my actual database which has rows indicating time periods during which measurements were taken for each subject; I want to keep ALL of those rows. I want to add in data from the second database (additional) based on whether "additional" has values that correspond to the time periods in each of the rows in "master". If there are no measurements in “additional” that correspond to the specified time periods, I would like for those rows to be retained as missing values, instead of dropped. For instance, for example Subject #001, I want to merge data from “additional” into the “master” database to produce the “want” database below:
Master:
StudyID | StartDate_C | EndDate1_C | Date_12mopost |
001 | 01/04/2012 | 07/10/2012 | 01/04/2013 |
001 | 03/07/2013 | 06/18/2013 | 03/07/2014 |
001 | 06/25/2013 | 01/01/2015 | 06/24/2014 |
Want (desired combined database):
StudyID | StartDate_C | EndDate1_C | Date_12mopost | full_lab_Date |
001 | 01/04/2012 | 07/10/2012 | 01/04/2013 | . |
001 | 03/07/2013 | 06/18/2013 | 03/07/2014 | . |
001 | 06/25/2013 | 01/01/2015 | 06/24/2014 | 07/31/2013 |
001 | 06/25/2013 | 01/01/2015 | 06/24/2014 | 11/22/2013 |
I used left join (code below), but it looks like this code is still causing rows without matches in “additional” to be dropped:
proc sql;
create table want as
select c.*, v.*
from WORK.master AS c left join WORK.additional AS v
on c.StudyID=v.StudyID_copy
where c.StartDate_C <= v.full_lab_date <= c.DATE_12MOPOST ;
;
quit;
want (using the code above, which produces something different than what I want):
StudyID | StartDate_C | EndDate1_C | Date_12mopost | full_lab_DATE |
001 | 06/25/2013 | 01/01/2015 | 06/24/2014 | 07/31/2013 |
001 | 06/25/2013 | 01/01/2015 | 06/24/2014 | 11/22/2013 |
Is there a modification I can make to my code in order to get the desired outcome?
Thank you in advance.
can you change
where c.StartDate_C <= v.full_lab_date <= c.DATE_12MOPOST
to
and c.StartDate_C <= v.full_lab_date <= c.DATE_12MOPOST
can you change
where c.StartDate_C <= v.full_lab_date <= c.DATE_12MOPOST
to
and c.StartDate_C <= v.full_lab_date <= c.DATE_12MOPOST
Try a full join. This will keep data where A or B exist.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.