Hello.
I have 2 files and I have to link them by customer name but the names in one file are not spelled correctly. Is there some code to compare the 2 columns and return % match based on some algorithm looking at number of characters, order of characters, etc.
Test dataset below:
data yourdata;
infile datalines dlm=' ';
input String1 $ String2 $;
datalines;
George Gorge
George George
George Georg
George Grge
George Greg;
run;
Guess of what the output would be arbitrarily picking percentages : data in () are my thoughts.
George Gorge 83% (less 1 character and out of order)
George George 100% (same # of characters in same order)
George Georg 90% (same order short 1 character)
George Grge 60% (short 2 characters but not out of order)
George Greg 50% (short2 characters and out of order)
It's not % difference, but for something similar, there are functions COMPLEV and COMPGED, also older SPEDIS.
Also you might want to search lexjansen.com for user group papers on fuzzy matching, e.g.
https://www.lexjansen.com/pharmasug/2022/AP/PharmaSUG-2022-AP-030.pdf
Thank you, I will review the docs.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.