BookmarkSubscribeRSS Feed
kumarK
Quartz | Level 8

Hello Guys, 

 

I have 2 datssets i wnat to mtach the new dataset with the old one and gives %of match like shown below

 

Company master file like this :

 

Company_Master
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc

 

I will recive new compay names frequently like this

 

Company_New
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals, Inc.
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications

 

I want output like below: match with the master file and gives the % match with the master file names..any ideas i am looking at some compare functions but that not giving satisfactory results. any thoughts? 

 

Company_New % Matched
Adamas Pharmaceuticals LLC 90
Adimas Pharmaceuticals 80
Adamas Pharmaceuticals, Inc. 100
McHugh Inc 100
Mc Hugh innc 90
Mccley INC 0
Acacia Communications 0

 

Thanks.

5 REPLIES 5
Ksharp
Super User
How do you computer that %Matched .
Check the function about spell distance and edit distance Like : spedis(), complev() ..



data master;
input master $80.;
cards;
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc
;
run;

data new;
input new $80.;
cards;
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals, Inc.
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications
;
run;
proc sql;
select new,count(*)/(select count(*) from master) as per format=percent8.2
 from new as a,master as b
  where a.new =* b.master
   group by a.new;
quit;

kumarK
Quartz | Level 8

Thanks for the reply.

%matched column thats not caluculated that i just want to show this type of output am expecting...

based on the output its just dividing the count/total count,

 

i am expecting is ther any method that will give probability of match? 

Ksharp
Super User
Sorry. I don't know what you are talking about.
How do you define that ' probability of match '?
How to calculate ' probability of match' ?
Gives us an example.
Ksharp
Super User
If you want all the obs of Company_New :



data master;
input master $80.;
cards;
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc
;
run;

data new;
input new $80.;
cards;
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals, Inc.
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications
;
run;
proc sql;
select new,count(master)/(select count(master) from master) as per format=percent8.2
 from new as a left join master as b
  on a.new =* b.master
   group by a.new;
quit;


PGStats
Opal | Level 21

Example of using edit distance to do some matching with function complev:

 

data master;
input master &:$upcase80.;
cards;
Adamas Pharmaceuticals, Inc.
Acacia Communications, Inc.
McHugh Inc
;

data new;
input new &:$upcase80.;
cards;
Adamas Pharmaceuticals, LLC
Adamas Pharmaceuticals, Inc.
Adimas Pharmaceuticals
McHugh Inc
Mc Hugh innc
Mccley INC
Acacia Communications
;

proc sql;
create table matches as
select 
    new, master, 
    complev(new, master, 8) as complev,
    calculated complev / length(master) as distance
from 
    new inner join 
    master on substr(new,1,2) = substr(master,1,2)
where calculated complev < 8;
select * from matches
group by new
having distance = min(distance);
quit;

Note how I divided edit distance by string length to give a relative measure of spelling error. Could be useful to filter out some matches between short names.  

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1060 views
  • 0 likes
  • 3 in conversation