I am trying to merge data from one data set to another based on first and last name. I run the following code:
proc sql; create table merged_data as select * from pull_data where strip(upcase(last_name)) in (select strip(upcase(NameLast)) from reference_data) and strip(upcase(first_name)) in (select strip(upcase(NameFirst)) from reference_data); quit;
This matches all last names without regard to first name. So in the example data below, fake_l_name_1 will match with fake_f_name_1 and fake_f_name_2, rather than with fake_f_name_1 only. Can anyone tell me why proc sql is ignoring the 'and' statement?
data pull_data; input userID $4. first_name $14. last_name $14. ; datalines; 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1693 fake_f_name_1 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1129 fake_f_name_2 fake_l_name_1 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1345 fake_f_name_3 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 1492 fake_f_name_4 fake_l_name_2 ; data reference_data; input NameFirst $14. NameLast $14. ; datalines; fake_f_name_1 fake_l_name_1 fake_f_name_2 fake_l_name_2 fake_f_name_3 fake_l_name_3 fake_f_name_4 fake_l_name_4 fake_f_name_5 fake_l_name_5 fake_f_name_6 fake_l_name_6 fake_f_name_7 fake_l_name_7 fake_f_name_8 fake_l_name_8 ;
Thank you for providing sample data via working SAS code.
I assume what you're trying to achieve is an inner join. Code like below should do.
proc sql;
create table merged_data as
select a.*
from pull_data a inner join reference_data b
on strip(upcase(a.last_name))=strip(upcase(b.NameLast))
and strip(upcase(a.first_name))=strip(upcase(b.NameFirst))
;
quit;
The filter condition you defined does independent lookups in table reference_data meaning the condition gets true if the first and last name in your base table exist anywhere in the lookup table reference_data - it doesn't need to be on the same row.
Thank you for providing sample data via working SAS code.
I assume what you're trying to achieve is an inner join. Code like below should do.
proc sql;
create table merged_data as
select a.*
from pull_data a inner join reference_data b
on strip(upcase(a.last_name))=strip(upcase(b.NameLast))
and strip(upcase(a.first_name))=strip(upcase(b.NameFirst))
;
quit;
The filter condition you defined does independent lookups in table reference_data meaning the condition gets true if the first and last name in your base table exist anywhere in the lookup table reference_data - it doesn't need to be on the same row.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.