DATA Step, Macro, Functions and more

Nick Name matching function

Reply
Contributor
Posts: 34

Nick Name matching function

Hi Team,

   I am looking for matching function that matches given person nick name to real name. For Ex, If i supply DAVE it should match with DAVID , if i supply JOHN it should match with JOHNSON . I tried Sound like (=*) operator & Soundex function also, but it didn't give desire output.

Thanks,

Prabakaran

Super Contributor
Posts: 474

Re: Nick Name matching function

Hi.

If you're not happy with the sounds like operator/soundex, I don't see any function that will do this kind of "magical" match.

Maybe perl regular expressions will get you nearer, but I'm sure it will go well farther than just one magical simple expression to do the match.

Unless you are coding this with a full set of good rules or dictionary, can't see how to do this.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt

Super User
Posts: 5,072

Re: Nick Name matching function

There's plenty that has been done in this areas.  Try a google search on nickname to names.

You will find that there is not a one-to-one match.  For example, Ellie usually matches to Ellen, but could also match to Eleanor, Elisa (and spelling variations such as Elissa).  You may have to prioritize from a list of possible matches.

Good luck.

Valued Guide
Posts: 765

Re: Nick Name matching function

Hi ... as ASTOUNDING points out, there are a lot of postings about name matching.  As for SOUNDEX, it might be handy to know exactly what you are comparing with that algorithm.  Also, there's a similar technique called NYSIIS ...

New York State Identification and Intelligence System - Wikipedia, the free encyclopedia

and SAS code (that I found on the web a long time ago so I cannot give you the source) for the algorithm ... http://www.albany.edu/~msz03/nesug/combined.zip

You might look into some SAS functions that compare character strings and produce scores based on the degree of matching (e.g. SPEDIS, COMPGED)

Here's an example with some names (%NYSIIS uses the macro from the above link).  There are comparisons of the original names, "soundexed names", and "NYSIIS names" using SPEDIS and COMPGED ... the COMPGED comparisons of the "soundexed names" have consistently low scores.  Nothing is going to perfect short of a self-defined lookup table that does the exact name conversions that you specify.

ps  an old conference, but lots of good matching info ... how much do you want to read ???  ... Record Linkage Techniques - 1997

data x;

input (name1 name2) (:$20.);

%nysiis(name1,ncode1);            %nysiis(name2,ncode2)

scode1 = soundex(name1);           scode2 = soundex(name2);

spedis0 = spedis(name1,name2);     spedisn = spedis(ncode1,ncode2);

spediss = spedis(scode1,scode2);   compged0 = compged(name1,name2);

compgedn = compged(ncode1,ncode2); compgeds = compged(scode1,scode2);

datalines;

Mike Michael

John Johnson

Dave David

Ellie Ellen

Ellie Eleanor

Astound Astounding

Art Arthur

Dan Daniel

Lin Linlin

;

                                                                              c     c     c

                                                              s    s     s    o     o     o

                         n         n         s        s       p    p     p    m     m     m

   n          n          c         c         c        c       e    e     e    p     p     p

   a          a          o         o         o        o       d    d     d    g     g     g

   m          m          d         d         d        d       i    i     i    e     e     e

   e          e          e         e         e        e       s    s     s    d     d     d

   1          2          1         2         1        2       0    n     s    0     n     s

Mike      Michael      MAC     MACAL       M2      M24       62    33   25   230    20   10

John      Johnson      JAN     JANSAN      J5      J525      37    50   50    30    30   20

Dave      David        DAV     DAVAD       D1      D13       37    33   25   110    20   10

Ellie     Ellen        EL      ELAN        E4      E45       30    50   25   200    20   10

Ellie     Eleanor      EL      ELANAR      E4      E456      70   100   50   320    40   20

Astound   Astounding   ASTAN   ASTANDANG   A2353   A235352   21    40   20    30    40   20

Art       Arthur       AD      ARTAR       A63     A636      50   125   16    30   130   10

Dan       Daniel       DAN     DANAL       D5      D54       50    33   25    30    20   10

Lin       Linlin       LAN     LANLAN      L5      L545      50    50   50    30    30   20

Ask a Question
Discussion stats
  • 3 replies
  • 766 views
  • 0 likes
  • 4 in conversation