BookmarkSubscribeRSS Feed
cdubs
Quartz | Level 8

I have a column called "name" where people manually inputted a lot of people's names. 

 

I am hoping to use SAS to look for duplicates, but there's a high chance there's high error rates, e.g. a few typos but a "normal human" would know that the two names are likely referring to the same person. 

 

I was wondering what it might look like if I asked SAS to first sort names in alphabetical order, then for each entry, compare the string directly and below it -- if either of those entries has the same amount of characters +/- 2, and then the same letters but with 4 differences, then we will count that as a duplicate and then the new var duplicate = 1. 

 

Not sure if people have ideas as to how I might compare strings this way? 

 

Thank you! 

3 REPLIES 3
Patrick
Opal | Level 21

@cdubs

Similar questions like yours come up from time to time. Have you already searched this forum with terms like "fuzzy match"?

You would find discussions like: https://communities.sas.com/t5/Base-SAS-Programming/Fuzzy-match-with-soundex-and-compged/m-p/295334/...

cdubs
Quartz | Level 8

I haven't! I'll look at those now 🙂 

PeterClemmensen
Tourmaline | Level 20

Have a look at the COMPGED and COMPLEV functions are good places to start.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2148 views
  • 1 like
  • 3 in conversation