BookmarkSubscribeRSS Feed
dapenDaniel
Obsidian | Level 7

I have two data sets as following below.

 

Data set 1 is a list of company names.

 

Name

Apple

Google

Amazon

 

Data set 2 has the following variables.

 

ID           Year1       USname             nameO                                  Country    Year2

01           1992          Apple                 Apple | Ford                             US          1993     

02           1993         Alphabet            Toyato | Amazon                       US          1995

03           1994       Amazon.com       Amazon                                     US          1996

04           1995         Alphabet            Google                                      US           1997

05           1996          IBM                     IBM                                         US            1998

 

Name in data set 1 is the same with nameO in data set 2 (Google) or Name in data set1 is partly same with nameO in data set 2 (Apple).

 

What I want is to match these two data sets as long as NameO is partly same with Name.

 

Expected

ID           Year1       USname             nameO                                  Country    Year2           name

01           1992          Apple                 Apple | Ford                             US          1993            Apple

02           1993         Alphabet            Toyato | Amazon                       US          1995           Amazon

03           1994       Amazon.com       Amazon                                     US          1996          Amazon

04           1995         Alphabet            Google                                      US           1997          Google

 

 

What code do I need to use? Thanks a lot!

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

How similar?  How much data?  For a small amount of data you could just merge (or join) the smaller data on and do an if set or index function on it.  If its larger data then it would be highly beneficial to split the nameO column up into individual elements rather than having multiple items in one column - its always a good idea to have only one item per column anyways.  For large data you could also do:

data _null_;
  set small end=last;
  if _n_=1 then call execute('data want; set large;');
  call execute(cats('if index(nameo,"',name,'") then output;'));
  if last then call execute('run;');
run;

That would generate a datastep with an if for each element in the smaller datastep.  But I would really advise to split that data up.

 

Ksharp
Super User
data a;
input Name $;
cards;
Apple
Google
Amazon
;

data b;
input ID           Year1       USname   : $20.          nameO   & $20.         Country  $  Year2;
cards;
01           1992          Apple                 Apple | Ford                             US          1993     
02           1993         Alphabet            Toyato | Amazon                       US          1995
03           1994       Amazon.com       Amazon                                     US          1996
04           1995         Alphabet            Google                                      US           1997
05           1996          IBM                     IBM                                         US            1998
;


proc sql;
select *
 from b inner join a 
  on b.nameO contains strip(a.name);
quit;
dapenDaniel
Obsidian | Level 7

Thanks for your answer. There is another problem. 

 

In data a, Names are using the upper case such as "APPLE" and NameO in data b are using lower case such as "apple" or "Apple". I found that "APPLE" in data a cannot be matched with "apple" in data b. 

 

Is there any way to ignore the lower case or upper case? Thanks.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Use upper or lower functions:

  on lower(b.nameO) contains strip(lower(a.name));

Then all text compares will be low case. 

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1002 views
  • 0 likes
  • 3 in conversation