I have two data sets as following below.
Data set 1 is a list of company names.
Name
Apple
Amazon
Data set 2 has the following variables.
ID Year1 USname nameO Country Year2
01 1992 Apple Apple | Ford US 1993
02 1993 Alphabet Toyato | Amazon US 1995
03 1994 Amazon.com Amazon US 1996
04 1995 Alphabet Google US 1997
05 1996 IBM IBM US 1998
Name in data set 1 is the same with nameO in data set 2 (Google) or Name in data set1 is partly same with nameO in data set 2 (Apple).
What I want is to match these two data sets as long as NameO is partly same with Name.
Expected
ID Year1 USname nameO Country Year2 name
01 1992 Apple Apple | Ford US 1993 Apple
02 1993 Alphabet Toyato | Amazon US 1995 Amazon
03 1994 Amazon.com Amazon US 1996 Amazon
04 1995 Alphabet Google US 1997 Google
What code do I need to use? Thanks a lot!
How similar? How much data? For a small amount of data you could just merge (or join) the smaller data on and do an if set or index function on it. If its larger data then it would be highly beneficial to split the nameO column up into individual elements rather than having multiple items in one column - its always a good idea to have only one item per column anyways. For large data you could also do:
data _null_;
set small end=last;
if _n_=1 then call execute('data want; set large;');
call execute(cats('if index(nameo,"',name,'") then output;'));
if last then call execute('run;');
run;
That would generate a datastep with an if for each element in the smaller datastep. But I would really advise to split that data up.
data a;
input Name $;
cards;
Apple
Google
Amazon
;
data b;
input ID Year1 USname : $20. nameO & $20. Country $ Year2;
cards;
01 1992 Apple Apple | Ford US 1993
02 1993 Alphabet Toyato | Amazon US 1995
03 1994 Amazon.com Amazon US 1996
04 1995 Alphabet Google US 1997
05 1996 IBM IBM US 1998
;
proc sql;
select *
from b inner join a
on b.nameO contains strip(a.name);
quit;
Thanks for your answer. There is another problem.
In data a, Names are using the upper case such as "APPLE" and NameO in data b are using lower case such as "apple" or "Apple". I found that "APPLE" in data a cannot be matched with "apple" in data b.
Is there any way to ignore the lower case or upper case? Thanks.
Use upper or lower functions:
on lower(b.nameO) contains strip(lower(a.name));
Then all text compares will be low case.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.