BookmarkSubscribeRSS Feed
mlogan
Lapis Lazuli | Level 10

Hi All,

I am looking for a code that will allow me to find duplicate based on exact as well as similar value. I know for exact value we can use NODUP, NODUPKEYS and NOUNIQUEKEYS. But can anyone tell me what function I can use to identify duplicate records based on similarity value.

 

Example: Following two records should be considered as duplicate though Last name and Address are not same (but they are similar).  

 

First Name      Last Name  Address

John                Murruy         1 New York St.

John                Murray         1 New York Street 

 

Thanks,

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

This question is a bit fuzzy in its nature because how similar should two strings be for them to be considered equal? Ie. when are two strings in two different observations similar enough to be considered duplicates and therefore omitted?

 

Two functions to get you going are the COMPLEV and COMPGED Functions. Both functions take two strings as input and return a number, which represents the 'distance' between two strings. 

SimonDawson
SAS Employee
If you have a license for the SAS Data Quality procedures I'd look into match code generation to solve this type of problem.

http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#n1597gcbsehaokn1j5...

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1340 views
  • 2 likes
  • 3 in conversation