BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
axescot78
Quartz | Level 8

I am trying to merge data from one data set to another based on first and last name. I run the following code:

 

proc sql;
	create table merged_data as
	select *
	from pull_data
	where strip(upcase(last_name)) in (select strip(upcase(NameLast)) from reference_data)
		and strip(upcase(first_name)) in (select strip(upcase(NameFirst)) from reference_data);
quit;

This matches all last names without regard to first name. So in the example data below, fake_l_name_1 will match with fake_f_name_1 and fake_f_name_2, rather than with fake_f_name_1 only. Can anyone tell me why proc sql is ignoring the 'and' statement?

data pull_data;
input userID $4. first_name $14. last_name $14. ;
datalines;
1693 fake_f_name_1 fake_l_name_1
1693 fake_f_name_1 fake_l_name_1
1693 fake_f_name_1 fake_l_name_1
1693 fake_f_name_1 fake_l_name_1
1693 fake_f_name_1 fake_l_name_1
1693 fake_f_name_1 fake_l_name_1
1129 fake_f_name_2 fake_l_name_1
1129 fake_f_name_2 fake_l_name_1
1129 fake_f_name_2 fake_l_name_1
1129 fake_f_name_2 fake_l_name_1
1129 fake_f_name_2 fake_l_name_1
1129 fake_f_name_2 fake_l_name_1
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1345 fake_f_name_3 fake_l_name_2
1492 fake_f_name_4 fake_l_name_2
1492 fake_f_name_4 fake_l_name_2
1492 fake_f_name_4 fake_l_name_2
1492 fake_f_name_4 fake_l_name_2
;

data reference_data;
input NameFirst $14. NameLast $14. ;
datalines;
fake_f_name_1	fake_l_name_1
fake_f_name_2	fake_l_name_2
fake_f_name_3	fake_l_name_3
fake_f_name_4	fake_l_name_4
fake_f_name_5	fake_l_name_5
fake_f_name_6	fake_l_name_6
fake_f_name_7	fake_l_name_7
fake_f_name_8	fake_l_name_8
;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Thank you for providing sample data via working SAS code.

 

I assume what you're trying to achieve is an inner join. Code like below should do.

proc sql;
	create table merged_data as
	select a.*
	from pull_data a inner join reference_data b
    on strip(upcase(a.last_name))=strip(upcase(b.NameLast)) 
      and strip(upcase(a.first_name))=strip(upcase(b.NameFirst))
  ;
quit;

The filter condition you defined does independent lookups in table reference_data meaning the condition gets true if the first and last name in your base table exist anywhere in the lookup table reference_data - it doesn't need to be on the same row.

 

View solution in original post

1 REPLY 1
Patrick
Opal | Level 21

Thank you for providing sample data via working SAS code.

 

I assume what you're trying to achieve is an inner join. Code like below should do.

proc sql;
	create table merged_data as
	select a.*
	from pull_data a inner join reference_data b
    on strip(upcase(a.last_name))=strip(upcase(b.NameLast)) 
      and strip(upcase(a.first_name))=strip(upcase(b.NameFirst))
  ;
quit;

The filter condition you defined does independent lookups in table reference_data meaning the condition gets true if the first and last name in your base table exist anywhere in the lookup table reference_data - it doesn't need to be on the same row.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 817 views
  • 0 likes
  • 2 in conversation