I'm trying to join two databases. Some rows in the left database have corresponding rows in the right database, but the output database doesn't show them. (Note: I've tried to reproduce this error with a reproducible example, but can't).
Here's a few rows of the left database:
Here are some rows from the right table:
Here's the code I'm using to join them:
/*Add County to Overlap Table*/ proc sql; create table overlap_all as select distinct a.*, b.patcounty as county from overlap_long a left join opioids_loc b on a.patientid = b.patientid and a.monyear = b.monyear; quit;
And here are some rows from the output table:
Some of the values aren't showing up. For example, Row 10 in the output table (corresponding to "monyear" = 7-2021, "patientid" = 5472) should read "HUMBOLDT", but doesn't.
Does anybody have any idea of what's going on? The weird part is that some rows that should show up, but others don't.
Thanks,
David
Hard to tell but if MONYEAR is a character column and values aren't left justified then that might explain why some aren't matching. Try this to see if this could be happening:
proc sql;
create table overlap_all as
select distinct a.*,
b.patcounty as county
from overlap_long a
left join opioids_loc b
on a.patientid = b.patientid
and left(a.monyear) = left(b.monyear);
quit;
It is very hard to tell from photographs what values are in your datasets. Are those little red icons that look kind of like the letter A supposed to indicate that the MONYEAR variables are CHARACTER variables?
Leading spaces are significant in comparing character values. So '7-2011' and ' 7-2011' might look the same in a photograph but will not actually match. Also if one of them has an endash or emdash character instead of a hyphen then they will not match.
Yes, "monyear" is a character variable. The dashes are all n-dashes, but now that I look closely, only values "10-2021", "11-2021", and "12-2021" have he county values that should be there.
There's clearly a mismatch--maybe leading spaces--between monyear in the two datasets. Let my try trimming the leading values or formatting length of "monyear" in the left and right datasets so that they're the same.
Any other suggestions?
Hard to tell but if MONYEAR is a character column and values aren't left justified then that might explain why some aren't matching. Try this to see if this could be happening:
proc sql;
create table overlap_all as
select distinct a.*,
b.patcounty as county
from overlap_long a
left join opioids_loc b
on a.patientid = b.patientid
and left(a.monyear) = left(b.monyear);
quit;
Worked like a charm. Thanks, yet again!, SASKiwi, and to Tom, for both of your help.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.