BookmarkSubscribeRSS Feed
deblee73
Calcite | Level 5

Hello.  

I have 2 files and I have to link them by customer name but the names in one file are not spelled correctly.  Is there some code to compare the 2 columns and return % match based on some algorithm looking at number of characters, order of characters, etc.  

 

Test dataset below:

 

data yourdata;
infile datalines dlm=' ';
input String1 $ String2 $;
datalines;
George Gorge

George George

George Georg

George Grge

George Greg;
run;

 

Guess of what the output would be arbitrarily picking percentages :  data in () are my thoughts.  

 

George Gorge   83%  (less 1 character and out of order)

George George  100%  (same # of characters in same order)

George Georg   90% (same order short 1 character)

George Grge  60%  (short 2 characters but not out of order)

George Greg  50%  (short2 characters and out of order)

2 REPLIES 2
Quentin
PROC Star

It's not % difference, but for something similar, there are functions COMPLEV and COMPGED, also older SPEDIS.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p1r4l9jwgatggtn1ko81fyjys4s7.h...

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0l41pdemybegln1oetsh4cctdap.h...

 

Also you might want to search lexjansen.com for  user group papers on fuzzy matching, e.g.

https://www.lexjansen.com/pharmasug/2022/AP/PharmaSUG-2022-AP-030.pdf

 

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
deblee73
Calcite | Level 5

Thank you, I will review the docs. 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 192 views
  • 0 likes
  • 2 in conversation