I am trying to merge data from one data set to another based on first and last name. I run the following code:
proc sql; create table merged_data as select * from pull_data where strip(upcase(last_name)) in (select strip(upcase(NameLast)) from reference_data) and strip(upcase(first_name)) in (select strip(upcase(NameFirst)) from reference_data); quit;
This matches all last names without regard to first name. So in the example data below, fake_l_name_1 will match with fake_f_name_1 and fake_f_name_2, rather than with fake_f_name_1 only. Can anyone tell me why proc sql is ignoring the 'and' statement?
data pull_data; input userID $4. first_name $14. last_name $14. ; datalines; 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 ; data reference_data; input NameFirst $14. NameLast $14. ; datalines; fake_f_name_1 fake_l_name_1 fake_f_name_2 fake_l_name_2 fake_f_name_3 fake_l_name_3 fake_f_name_4 fake_l_name_4 fake_f_name_5 fake_l_name_5 fake_f_name_6 fake_l_name_6 fake_f_name_7 fake_l_name_7 fake_f_name_8 fake_l_name_8 ;
Thank you for providing sample data via working SAS code.
I assume what you're trying to achieve is an inner join. Code like below should do.
proc sql;
create table merged_data as
select a.*
from pull_data a inner join reference_data b
on strip(upcase(a.last_name))=strip(upcase(b.NameLast))
and strip(upcase(a.first_name))=strip(upcase(b.NameFirst))
;
quit;
The filter condition you defined does independent lookups in table reference_data meaning the condition gets true if the first and last name in your base table exist anywhere in the lookup table reference_data - it doesn't need to be on the same row.
Thank you for providing sample data via working SAS code.
I assume what you're trying to achieve is an inner join. Code like below should do.
proc sql;
create table merged_data as
select a.*
from pull_data a inner join reference_data b
on strip(upcase(a.last_name))=strip(upcase(b.NameLast))
and strip(upcase(a.first_name))=strip(upcase(b.NameFirst))
;
quit;
The filter condition you defined does independent lookups in table reference_data meaning the condition gets true if the first and last name in your base table exist anywhere in the lookup table reference_data - it doesn't need to be on the same row.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.