Hi All,
I am looking for a code that will allow me to find duplicate based on exact as well as similar value. I know for exact value we can use NODUP, NODUPKEYS and NOUNIQUEKEYS. But can anyone tell me what function I can use to identify duplicate records based on similarity value.
Example: Following two records should be considered as duplicate though Last name and Address are not same (but they are similar).
First Name Last Name Address
John Murruy 1 New York St.
John Murray 1 New York Street
Thanks,
This question is a bit fuzzy in its nature because how similar should two strings be for them to be considered equal? Ie. when are two strings in two different observations similar enough to be considered duplicates and therefore omitted?
Two functions to get you going are the COMPLEV and COMPGED Functions. Both functions take two strings as input and return a number, which represents the 'distance' between two strings.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.