BookmarkSubscribeRSS Feed
deblee73
Calcite | Level 5

Hello.  

I have 2 files and I have to link them by customer name but the names in one file are not spelled correctly.  Is there some code to compare the 2 columns and return % match based on some algorithm looking at number of characters, order of characters, etc.  

 

Test dataset below:

 

data yourdata;
infile datalines dlm=' ';
input String1 $ String2 $;
datalines;
George Gorge

George George

George Georg

George Grge

George Greg;
run;

 

Guess of what the output would be arbitrarily picking percentages :  data in () are my thoughts.  

 

George Gorge   83%  (less 1 character and out of order)

George George  100%  (same # of characters in same order)

George Georg   90% (same order short 1 character)

George Grge  60%  (short 2 characters but not out of order)

George Greg  50%  (short2 characters and out of order)

2 REPLIES 2
Quentin
Super User

It's not % difference, but for something similar, there are functions COMPLEV and COMPGED, also older SPEDIS.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p1r4l9jwgatggtn1ko81fyjys4s7.h...

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0l41pdemybegln1oetsh4cctdap.h...

 

Also you might want to search lexjansen.com for  user group papers on fuzzy matching, e.g.

https://www.lexjansen.com/pharmasug/2022/AP/PharmaSUG-2022-AP-030.pdf

 

deblee73
Calcite | Level 5

Thank you, I will review the docs. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1506 views
  • 0 likes
  • 2 in conversation