BookmarkSubscribeRSS Feed
quipmastre
Calcite | Level 5

Hello SAS Community,

 

My problem, I suspect, stems from user input. Each person is able to submit unlimited applications (records). Therefore, we receive data sets that are quite similar except for some variance that distorts the cleanup. 

 

Below is a sample dataset. We have used the SHA256 function to create the UniqID based on the Security field. But upon further review, other variables are strikingly similar. Sometimes the names are spelled differently, the security variable is missing data as well as the phone number. Other times we find that people put in the same number but have a different name and security variable. 

 

So my question is, how can I create a program that will look for all of these variances? 

 

Thank you.

 

I appreciate any gentle nudge.

 

Luis

1 REPLY 1
ballardw
Super User

Search the forum for "name matching" as this comes up moderately often.

 

Or you might investigate the CDC product Link Plus:  https://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm

 

The web site references for Cancer studies but it will do probabilistic matching on a number of fields that you designate such as names, addresses, phone number, date of birth or other identifying information.

 

It is free, so cost is not a factor.

 

If you have lots of these it is likely quicker than writing and debugging your own code. The output includes probability of match between pairs of records. So typical differences have pretty high probabilities and are easy to search.

 

I last used this when a vendor changed their data base and compared the same clients to the previous data extracts and was able to map the old identifier field to the new one (the main purpose) and then identify name spelling issues, date of birth and gender "changes" from old to new data.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 754 views
  • 0 likes
  • 2 in conversation