BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Hello.

 

I have two data files. File A contains some company names and File B is the full list of company names. The variables are following below.

 

File A: ID, Name

File B: Name1, Name2, Name3 (3 names are for the same company but different expression. For example, Name1 = IBM and Name 2 = International Business Machines)

 

What I want is to search names from File A in File B. If names in File A can match one of the names in File B, then a new variable called NewName is created in File A and NewNames equals to Name1 from File B. 

 

 

Can anyone tell me what code I need to use? Thanks.

 

For example

 

File A

 

ID        Name

01        IBM

02        Apple

 

File B

              Name1                                                    Name 2                                       Name 3

International Business Machines                            IBM                                          IBM Corp.

              Apple                                                       Apple Inc.                                     AAPL

            Google LLC                                               GOOGL                                      Google      

 

 

Expected File

 

ID            Name                      NewName (equal to Name1)

01             IBM                    International Business Machines

02             Apple                                     Apple

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data a;
input id $ name $;
cards;
01        IBM
02        Apple
;

data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines                            IBM                                          IBM Corp.
Apple                                                       Apple Inc.                                     AAPL
Google LLC                                               GOOGL                                      Google      
;

proc sql;
create table want as
select a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;

View solution in original post

15 REPLIES 15
novinosrin
Tourmaline | Level 20

Can you post a sample plz of both of your files that will help lazy people like me to also try

dapenDaniel
Obsidian | Level 7

I have added an example. Thanks.

novinosrin
Tourmaline | Level 20

data a;
input id $ name $;
cards;
01        IBM
02        Apple
;

data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines                            IBM                                          IBM Corp.
Apple                                                       Apple Inc.                                     AAPL
Google LLC                                               GOOGL                                      Google      
;

proc sql;
create table want as
select a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
dapenDaniel
Obsidian | Level 7

There are some duplicates in File B.

 

How can I remove these duplicates when I match these two files?

 

Thanks.

novinosrin
Tourmaline | Level 20

Do you mean like this?

 



data a;
input id $ name $;
cards;
01        IBM
02        Apple
;

data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines                            IBM                                          IBM Corp.
International Business Machines                            IBM                                          IBM Corp.
International Business Machines                            IBM                                          IBM Corp.
Apple                                                       Apple Inc.                                     AAPL
Apple                                                       Apple Inc.                                     AAPL
Google LLC                                               GOOGL                                      Google      
;
proc sql;
create table want as
select distinct a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
dapenDaniel
Obsidian | Level 7

I mean duplicate like this:

 

File A

 

ID        Name

01        IBM

02        Apple

 

File B

              Name1                                                    Name 2                                       Name 3

International Business Machines                            IBM                                          IBM Corp.

              Apple                                                       Apple Inc.                                     AAPL

              AAPL                                                         Apple                                        Apple INC.

            Google LLC                                               GOOGL                                      Google      

 

 

Expected File

 

ID            Name                      NewName (equal to Name1)

01             IBM                    International Business Machines

02             Apple                                     Apple

02             Apple                                     AAPL

 

I hope the expected file can list all possible "Name1" in File B for "Names" in File A.

 

Is it possible to make it?

novinosrin
Tourmaline | Level 20

Hi @dapenDaniel  The existing code accomplishes just that. Here is another test

 


data a;
input id $ name $;
cards;
01        IBM
02        Apple
;

data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines                            IBM                                          IBM Corp.
Apple                                                       Apple Inc.                                     AAPL
AAPL                                                         Apple                                        Apple INC.
Google LLC                                               GOOGL                                      Google     
;

proc sql;
create table want as
select a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;

proc print noobs;run;

Result

01    IBM      International Business Machines
02    Apple    Apple
02    Apple    AAPL
dapenDaniel
Obsidian | Level 7

Thank you so much for your timely reply!!!

 

Sorry I have another questions. I found that in File B, there are other weird data format

 

File A

ID              Name

01                IBM

02               Apple

03                bcd

 

File B

              Name1                                                    Name 2                                       Name 3

International Business Machines                            IBM                                          IBM Corp.

              Apple                                                       Apple Inc.                                     AAPL

              AAPL                                                        Apple                                          Apple Inc

           Google LLC                                               GOOGL                                         Google 

              abc                                                         abc | bcd                                      bcd | abc | aef

 

Expected File

 

ID    Name      NewName
01 IBM International Business Machines 02 Apple Apple 02 Apple AAPL
03 bcd abc

 For Name3 in File B, there are several names that are separate by "|". I hope that as long as the observation contains the same part (bcd in Name3), SAS helps me to get its corresponding Name1. It is unnecessary to be exactly the same.

 

Is it possible to do that?

 

Thank you very much!

novinosrin
Tourmaline | Level 20

Switch to contains logic

 

 


data a;
input id $ name $;
cards;
01        IBM
02        Apple
03        bcd
;

data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines                            IBM                                          IBM Corp.
Apple                                                       Apple Inc.                                     AAPL
AAPL                                                        Apple                                          Apple Inc
Google LLC                                               GOOGL                                         Google 
abc                                                         abc | bcd                                      bcd | abc | aef
;
proc sql;
create table want as
select id , name, name1
from a
left join
b
on catx('|', name1, name2, name3) contains trim(name);
quit;
SuryaKiran
Meteorite | Level 14

You may need to add Case expression to @novinosrin solution to add the name from table A if no match found in table B

 

proc sql;
create table want as
select a.*,
		case when b.name1 is null then a.name 
			else b.name1 end as new_name
from a a 
left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

@SuryaKiran   Good forward thinking, however personally i prefer to take advantage of coalsece ANSI or proc sql's proprietary version coalescec. Simpler, easy and more efficient

 

select distinct a.*,coalsecec(name1,a.name) as new_name

 

 

 

 

kiranv_
Rhodochrosite | Level 12

one more way

 

proc sql;
select id , name, name1
from a
left join
b
on index(catx(',', name1, name2, name3), trim(name)) gt 0;
novinosrin
Tourmaline | Level 20

so will contains i think

 

proc sql;

select id , name, name1
from a
left join
b
on catx(',', name1, name2, name3) contains trim(name);
quit;

 

 

kiranv_
Rhodochrosite | Level 12

both of them should work.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1361 views
  • 2 likes
  • 5 in conversation