I have two data sets as following below.
Data set 1 is a list of company names.
Name
Apple
Amazon
Data set 2 has the following variables.
ID Year1 USname nameO Country Year2
01 1992 Apple Apple | Ford US 1993
02 1993 Alphabet Toyato | Amazon US 1995
03 1994 Amazon.com Amazon US 1996
04 1995 Alphabet Google US 1997
05 1996 IBM IBM US 1998
Name in data set 1 is the same with nameO in data set 2 (Google) or Name in data set1 is partly same with nameO in data set 2 (Apple).
What I want is to match these two data sets as long as NameO is partly same with Name.
Expected
ID Year1 USname nameO Country Year2 name
01 1992 Apple Apple | Ford US 1993 Apple
02 1993 Alphabet Toyato | Amazon US 1995 Amazon
03 1994 Amazon.com Amazon US 1996 Amazon
04 1995 Alphabet Google US 1997 Google
@Ksharp offered the following codes to me. It is helpful.
proc sql;
select *
from b inner join a
on b.nameO contains strip(a.name);
quit;
However, I found another problem.
I found that Names using the upper case such as "APPLE" in data A cannot be matched with NameO using lower cases in data b such as "apple" or "Apple".
Is there any way to ignore the lower case or upper case? Thanks.
proc sql;
select *
from b inner join a
on upper(b.nameO) contains strip(upper(a.name));
quit;
proc sql;
select *
from b inner join a
on upper(b.nameO) contains strip(upper(a.name));
quit;
so this code changes both names as upper cases. Is it correct? Thanks.
it doesnt change the case in the output , but it does convert both to uppercase when it is joining them together
Thank you so much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.