DATA Step, Macro, Functions and more

How do I match names

Reply
Frequent Contributor
Posts: 84

How do I match names

Hello Guys, 

 

I have 2 datssets i wnat to mtach the new dataset with the old one and gives %of match like shown below

 

Company master file like this :

 

Company_Master
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc

 

I will recive new compay names frequently like this

 

Company_New
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals, Inc.
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications

 

I want output like below: match with the master file and gives the % match with the master file names..any ideas i am looking at some compare functions but that not giving satisfactory results. any thoughts? 

 

Company_New % Matched
Adamas Pharmaceuticals LLC 90
Adimas Pharmaceuticals 80
Adamas Pharmaceuticals, Inc. 100
McHugh Inc 100
Mc Hugh innc 90
Mccley INC 0
Acacia Communications 0

 

Thanks.

Super User
Posts: 10,018

Re: How do I match names

How do you computer that %Matched .
Check the function about spell distance and edit distance Like : spedis(), complev() ..



data master;
input master $80.;
cards;
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc
;
run;

data new;
input new $80.;
cards;
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals, Inc.
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications
;
run;
proc sql;
select new,count(*)/(select count(*) from master) as per format=percent8.2
 from new as a,master as b
  where a.new =* b.master
   group by a.new;
quit;

Frequent Contributor
Posts: 84

Re: How do I match names

Thanks for the reply.

%matched column thats not caluculated that i just want to show this type of output am expecting...

based on the output its just dividing the count/total count,

 

i am expecting is ther any method that will give probability of match? 

Super User
Posts: 10,018

Re: How do I match names

Sorry. I don't know what you are talking about.
How do you define that ' probability of match '?
How to calculate ' probability of match' ?
Gives us an example.
Super User
Posts: 10,018

Re: How do I match names

If you want all the obs of Company_New :



data master;
input master $80.;
cards;
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc
;
run;

data new;
input new $80.;
cards;
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals, Inc.
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications
;
run;
proc sql;
select new,count(master)/(select count(master) from master) as per format=percent8.2
 from new as a left join master as b
  on a.new =* b.master
   group by a.new;
quit;


Respected Advisor
Posts: 4,919

Re: How do I match names

Example of using edit distance to do some matching with function complev:

 

data master;
input master &:$upcase80.;
cards;
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc
;

data new;
input new &:$upcase80.;
cards;
Adamas Pharmaceuticals, LLC
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications
;

proc sql;
create table matches as
select 
    new, master, 
    complev(new, master, 8) as complev,
    calculated complev / length(master) as distance
from 
    new inner join 
    master on substr(new,1,2) = substr(master,1,2)
where calculated complev < 8;
select * from matches
group by new
having distance = min(distance);
quit;

Note how I divided edit distance by string length to give a relative measure of spelling error. Could be useful to filter out some matches between short names.  

PG
Ask a Question
Discussion stats
  • 5 replies
  • 293 views
  • 0 likes
  • 3 in conversation