1) extract constant parts into variables E.g., if find (string, “Mobile”) then city = Mobile; 2) group by those constant variables* *ideally take those constant parts out of the strings, e.g., using COMPRESS, just keep them in variables 3) create variable “length” with length of each string 4) sort by length descending and chose the longest string in each group (first.length) as your “standard” 5) compare strings in each group to its “standard” e.g., using COMPGED function* There are other functions that can compare two strings in SAS, COMPLEV, etc, for fuzzy matching 6) the function will return distance scores 7) addresses within reasonable distance of the standard are the same What is reasonable distance? You will have to find out experimentally - e.g., take one with score of 100 and see if it seems the same.
... View more