- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
I have two data files. File A contains some company names and File B is the full list of company names. The variables are following below.
File A: ID, Name
File B: Name1, Name2, Name3 (3 names are for the same company but different expression. For example, Name1 = IBM and Name 2 = International Business Machines)
What I want is to search names from File A in File B. If names in File A can match one of the names in File B, then a new variable called NewName is created in File A and NewNames equals to Name1 from File B.
Can anyone tell me what code I need to use? Thanks.
For example
File A
ID Name
01 IBM
02 Apple
File B
Name1 Name 2 Name 3
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
Google LLC GOOGL Google
Expected File
ID Name NewName (equal to Name1)
01 IBM International Business Machines
02 Apple Apple
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data a;
input id $ name $;
cards;
01 IBM
02 Apple
;
data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
Google LLC GOOGL Google
;
proc sql;
create table want as
select a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you post a sample plz of both of your files that will help lazy people like me to also try
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have added an example. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data a;
input id $ name $;
cards;
01 IBM
02 Apple
;
data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
Google LLC GOOGL Google
;
proc sql;
create table want as
select a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are some duplicates in File B.
How can I remove these duplicates when I match these two files?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you mean like this?
data a;
input id $ name $;
cards;
01 IBM
02 Apple
;
data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines IBM IBM Corp.
International Business Machines IBM IBM Corp.
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
Apple Apple Inc. AAPL
Google LLC GOOGL Google
;
proc sql;
create table want as
select distinct a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I mean duplicate like this:
File A
ID Name
01 IBM
02 Apple
File B
Name1 Name 2 Name 3
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
AAPL Apple Apple INC.
Google LLC GOOGL Google
Expected File
ID Name NewName (equal to Name1)
01 IBM International Business Machines
02 Apple Apple
02 Apple AAPL
I hope the expected file can list all possible "Name1" in File B for "Names" in File A.
Is it possible to make it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @dapenDaniel The existing code accomplishes just that. Here is another test
data a;
input id $ name $;
cards;
01 IBM
02 Apple
;
data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
AAPL Apple Apple INC.
Google LLC GOOGL Google
;
proc sql;
create table want as
select a.*,b.name1 as new_name
from a a left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
proc print noobs;run;
Result
01 IBM International Business Machines 02 Apple Apple 02 Apple AAPL
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for your timely reply!!!
Sorry I have another questions. I found that in File B, there are other weird data format
File A
ID Name
01 IBM
02 Apple
03 bcd
File B
Name1 Name 2 Name 3
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
AAPL Apple Apple Inc
Google LLC GOOGL Google
abc abc | bcd bcd | abc | aef
Expected File
ID Name NewName
01 IBM International Business Machines 02 Apple Apple 02 Apple AAPL
03 bcd abc
For Name3 in File B, there are several names that are separate by "|". I hope that as long as the observation contains the same part (bcd in Name3), SAS helps me to get its corresponding Name1. It is unnecessary to be exactly the same.
Is it possible to do that?
Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Switch to contains logic
data a;
input id $ name $;
cards;
01 IBM
02 Apple
03 bcd
;
data b;
input (name1-name3) ( &:$50.);
cards;
International Business Machines IBM IBM Corp.
Apple Apple Inc. AAPL
AAPL Apple Apple Inc
Google LLC GOOGL Google
abc abc | bcd bcd | abc | aef
;
proc sql;
create table want as
select id , name, name1
from a
left join
b
on catx('|', name1, name2, name3) contains trim(name);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may need to add Case expression to @novinosrin solution to add the name from table A if no match found in table B
proc sql;
create table want as
select a.*,
case when b.name1 is null then a.name
else b.name1 end as new_name
from a a
left join b b
on a.name=b.name1 or a.name=b.name2 or a.name=b.name3;
quit;
Suryakiran
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SuryaKiran Good forward thinking, however personally i prefer to take advantage of coalsece ANSI or proc sql's proprietary version coalescec. Simpler, easy and more efficient
select distinct a.*,coalsecec(name1,a.name) as new_name
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
one more way
proc sql;
select id , name, name1
from a
left join
b
on index(catx(',', name1, name2, name3), trim(name)) gt 0;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
so will contains i think
proc sql;
select id , name, name1
from a
left join
b
on catx(',', name1, name2, name3) contains trim(name);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
both of them should work.