Hi. If you are planning to use the SOUNDEX+SQL approach, the SOUNDS LIKE operator (=*) does a SOUNDEX comparison. Also, when comparing the content of multiple variables, you might consider adding a score. For example, if you find that all three variables (street, city, state) match using SOUNDEX, the matched obervations would receive a higher score that if fewer than all three matched. By the way, there are no rules in the SOUNDEX routine to for numbers, so you cannot compare zip codes. data test; input (name1 name2) (:$10.) @@; datalines; SMITH SMITH SMITHE SMYTHE SMITH ZMITH 12203 12203 13345 12209 ; proc sql; title "SOUNDEX FUNCTION"; select * from test where soundex(name1) = soundex(name2); title "SOUNDS LIKE OPERATOR"; select * from test where name1 =* name2; title "SOUNDS LIKE OPERATOR + ADD A NUMERIC SCORE"; select *, name1 =* name2 as score from test; quit; SOUNDEX FUNCTION name1 name2 SMITH SMITH SMITHE SMYTHE 12203 12203 13345 12209 SOUNDS LIKE OPERATOR name1 name2 SMITH SMITH SMITHE SMYTHE 12203 12203 13345 12209 SOUNDS LIKE OPERATOR + ADD A NUMERIC SCORE name1 name2 score SMITH SMITH 1 SMITHE SMYTHE 1 SMITH ZMITH 0 12203 12203 1 13345 12209 1 However, I think that you are taking a really easy way out by just using SOUNDEX to match records. There are a lot of SAS character functions (and CALL ROUTINES) that allow you to do string comparisons. A few are SPEDIS, COMPGED, and COMPLEV. Each of those provides you with a score that can be used to evaluatae just how close two strings. Modifying the online example in SAS help for rhe SPEDIS function ... data words; input (operation query keyword) ($); spedis = spedis(query, keyword); op_sndx = soundex(query); ky_sndx = soundex(keyword); compged = compged(query,keyword); complev = complev(query,keyword); datalines; match fuzzy fuzzy singlet fuzy fuzzy doublet fuuzzy fuzzy swap fzuzy fuzzy truncate fuzz fuzzy append fuzzys fuzzy delete fzzy fuzzy insert fluzzy fuzzy replace fizzy fuzzy firstdel uzzy fuzzy firstins pfuzzy fuzzy firstrep wuzzy fuzzy several floozy fuzzy ; SPEDIS, SOUNDEX, COMPGED, COMPLEV operation query keyword spedis op_sndx ky_sndx compged complev match fuzzy fuzzy 0 F2 F2 0 0 singlet fuzy fuzzy 6 F2 F2 20 1 doublet fuuzzy fuzzy 8 F2 F2 20 1 swap fzuzy fuzzy 10 F22 F2 20 2 truncate fuzz fuzzy 12 F2 F2 10 1 append fuzzys fuzzy 5 F22 F2 50 1 delete fzzy fuzzy 12 F2 F2 100 1 insert fluzzy fuzzy 16 F42 F2 100 1 replace fizzy fuzzy 20 F2 F2 100 1 firstdel uzzy fuzzy 25 U2 F2 200 1 firstins pfuzzy fuzzy 33 P2 F2 200 1 firstrep wuzzy fuzzy 40 W2 F2 200 1 several floozy fuzzy 50 F42 F2 300 3 Last, there's a wealth of material in SAS papers on matching character strings. It all comes down to how much work you'd like to invest in the process. Like I said, SOUNDEX seems (at least to me) like you are using something that is easy and fast but not necessarily the best approach. Try some reading ... Fuzzy Matching http://www.sascommunity.org/wiki/Fuzzy_Matching Compged makes matches easy to see! http://www.geocities.ws/nyasug2002/COMPGED.pdf The Fuzzy Feeling SASâ Provides: Electronic Matching of Records without Common Keys (great paper, pre-SPEDIS/COMPGED/COMPLEV http://ftp.sas.com/techsup/download/observations/obswww15/obswww15.pdf Do a Google search on "sas fuzzy matching". Do a Google search on "sas address matching" ... NOTE: one thing that you'll notice in a lot of address matching papers is the pre-processing of data sets to STANDARDIZE addresses according to some set of rules. A long time ago (it's at least 25 years now), I wrote a macro to standardize street types (AVE, AVENUE, STREET, STRT, ST, etc.) prior to matching ... still "out there" ... http://www.albany.edu/~msz03/nesug/combined.zip The above zip file also contains a macro for a SOUNDEX alternative named NYSIIS (developed by the New York State Identification and Intelligence System) ... SOUNDEX is not good for names with lots of vowels since the first rule in the SOUNDEX routine is ... Retain the first letter in the argument and discard the following letters: A E H I O U W Y NYSIIS rules are on Wikipedia (what isn't) ... https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
... View more