Hello.
I have 2 files and I have to link them by customer name but the names in one file are not spelled correctly. Is there some code to compare the 2 columns and return % match based on some algorithm looking at number of characters, order of characters, etc.
Test dataset below:
data yourdata;
infile datalines dlm=' ';
input String1 $ String2 $;
datalines;
George Gorge
George George
George Georg
George Grge
George Greg;
run;
Guess of what the output would be arbitrarily picking percentages : data in () are my thoughts.
George Gorge 83% (less 1 character and out of order)
George George 100% (same # of characters in same order)
George Georg 90% (same order short 1 character)
George Grge 60% (short 2 characters but not out of order)
George Greg 50% (short2 characters and out of order)
It's not % difference, but for something similar, there are functions COMPLEV and COMPGED, also older SPEDIS.
Also you might want to search lexjansen.com for user group papers on fuzzy matching, e.g.
https://www.lexjansen.com/pharmasug/2022/AP/PharmaSUG-2022-AP-030.pdf
Thank you, I will review the docs.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.