BookmarkSubscribeRSS Feed
deblee73
Calcite | Level 5

Hello.  

I have 2 files and I have to link them by customer name but the names in one file are not spelled correctly.  Is there some code to compare the 2 columns and return % match based on some algorithm looking at number of characters, order of characters, etc.  

 

Test dataset below:

 

data yourdata;
infile datalines dlm=' ';
input String1 $ String2 $;
datalines;
George Gorge

George George

George Georg

George Grge

George Greg;
run;

 

Guess of what the output would be arbitrarily picking percentages :  data in () are my thoughts.  

 

George Gorge   83%  (less 1 character and out of order)

George George  100%  (same # of characters in same order)

George Georg   90% (same order short 1 character)

George Grge  60%  (short 2 characters but not out of order)

George Greg  50%  (short2 characters and out of order)

2 REPLIES 2
Quentin
Super User

It's not % difference, but for something similar, there are functions COMPLEV and COMPGED, also older SPEDIS.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p1r4l9jwgatggtn1ko81fyjys4s7.h...

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0l41pdemybegln1oetsh4cctdap.h...

 

Also you might want to search lexjansen.com for  user group papers on fuzzy matching, e.g.

https://www.lexjansen.com/pharmasug/2022/AP/PharmaSUG-2022-AP-030.pdf

 

deblee73
Calcite | Level 5

Thank you, I will review the docs. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1539 views
  • 0 likes
  • 2 in conversation