The following definitly doesn't work but it may give you or others some ideas. I am not good with hash objects but I assume my example could be duplicated using one instead of my proc sql to explode the inputed records. Name matching algorithms are extremly complex and this is just a very dirty, low-quality match assurance test... In the output of the linkage set you can clearly see some matches are definitly not desireable, but I consider the results good for a few minutes work... data have; input id (namep1-namep2) (:$20.); retain yes 1; cards; 1 Alex Smith 2 Alexander Smith 3 Smith Alex 4 Smith Alexander 5 Matt Roberts 6 Matthew Robertson 7 Robert Matthews 8 Robert Mattherson 9 Matt Thompson 10 Tom Matterhorn 11 Alexis Smith 12 John Smith 13 Smith Alexis ; run; proc sql; create table want as select a.id, compress(a.namep1,,'kw') as namep1, compress(a.namep2,,'kw') as namep2, compress(b.namep1,,'kw') as namep1x, compress(b.namep2,,'kw') as namep2x, b.id as idx from have a join have b on a.id lt b.id; quit; data linkage; set want; /*----------------------------------------------* | i = ignore case | | l = remove leading blanks | | : = truncate to shortest length and compare | *----------------------------------------------*/ /* get complev of a:a */ complev1=complev(namep1,namep1x,'il'); complev2=complev(namep2,namep2x,'il'); complev1t=complev(namep1,namep1x,'il:'); complev2t=complev(namep2,namep2x,'il:'); avgcomplev_aa=mean(of complev1,complev2,complev1t,complev2t); if avgcomlev_aa<3 then do; complev=avgcomplev_aa; aa=1; end; /* get complev of a:b */ complev1x=complev(namep1,namep2x,'il'); complev2x=complev(namep2,namep1x,'il'); _complev1xt=complev(namep1,namep2x,'il:'); _complev2xt=complev(namep2,namep1x,'il:'); avgcomplev_ab=mean(of complev1x,complev2x,complev1xt,complev2xt); if avgcomplev_ab<3 then do; complev=avgcomplev_ab; _namep2x=namep2x; namep2x=namep1x; namep1x=_namep2x; aa=0; end; if complev<3; keep id idx namep1 namep2 namep1x namep2x complev aa; run; proc print data=linkage; run; Obs id namep1 namep2 namep1x namep2x idx complev aa 1 1 Alex Smith Alexander Smith 2 2.5 1 2 1 Alex Smith Alex Smith 3 0.0 0 3 1 Alex Smith Alexander Smith 4 2.5 0 4 1 Alex Smith Alexis Smith 11 1.0 1 5 1 Alex Smith John Smith 12 2.0 1 6 1 Alex Smith Alexis Smith 13 1.0 0 7 2 Alexander Smith Alex Smith 3 2.5 0 8 2 Alexander Smith Alexander Smith 4 0.0 0 9 2 Alexander Smith Alexis Smith 11 2.5 1 10 2 Alexander Smith Alexis Smith 13 2.5 0 11 3 Smith Alex Smith Alexander 4 2.5 1 12 3 Smith Alex Smith Alexis 11 1.0 0 13 3 Smith Alex Smith John 12 2.0 0 14 3 Smith Alex Smith Alexis 13 1.0 1 15 4 Smith Alexander Smith Alexis 11 2.5 0 16 4 Smith Alexander Smith Alexis 13 2.5 1 17 5 Matt Roberts Matthew Robertson 6 2.5 1 18 5 Matt Roberts Matthews Robert 7 2.5 0 19 6 Matthew Robertson Matthews Robert 7 2.0 0 20 7 Robert Matthews Robert Mattherson 8 1.5 1 21 11 Alexis Smith Alexis Smith 13 0.0 0
... View more