Hello everyone, I have a database with lot of rubbish, soon we will change our database and I'd like to clean as much as possible the data before the migration. I'm trying to identify possible duplicated records inside one dataset, which are not exact match but are similar. For example : Record 1 - Name=John, Surname=Doe, Address= Fake Street Record 2 - Name=Jonh, Surname= Doe Joe, Address=F. Street My idea is to create a unique string with name, surname, address, without spaces, (for example johndoefakestreet) and confront one to one all the record with all other record in the same dataset, (approximately 800k records) using compged function, and keep only the record with the smallest value in order to identify possible duplicates (which I know there are present). I don't know how to perform this operation, or if there is an easiest way to do this. I'm using sas 9.4, i hope it's clear what I'm trying to do Thanks!
... View more