BookmarkSubscribeRSS Feed
DPraba79
Calcite | Level 5

Hi Team,

   I am looking for matching function that matches given person nick name to real name. For Ex, If i supply DAVE it should match with DAVID , if i supply JOHN it should match with JOHNSON . I tried Sound like (=*) operator & Soundex function also, but it didn't give desire output.

Thanks,

Prabakaran

3 REPLIES 3
DanielSantos
Barite | Level 11

Hi.

If you're not happy with the sounds like operator/soundex, I don't see any function that will do this kind of "magical" match.

Maybe perl regular expressions will get you nearer, but I'm sure it will go well farther than just one magical simple expression to do the match.

Unless you are coding this with a full set of good rules or dictionary, can't see how to do this.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt

Astounding
PROC Star

There's plenty that has been done in this areas.  Try a google search on nickname to names.

You will find that there is not a one-to-one match.  For example, Ellie usually matches to Ellen, but could also match to Eleanor, Elisa (and spelling variations such as Elissa).  You may have to prioritize from a list of possible matches.

Good luck.

MikeZdeb
Rhodochrosite | Level 12

Hi ... as ASTOUNDING points out, there are a lot of postings about name matching.  As for SOUNDEX, it might be handy to know exactly what you are comparing with that algorithm.  Also, there's a similar technique called NYSIIS ...

New York State Identification and Intelligence System - Wikipedia, the free encyclopedia

and SAS code (that I found on the web a long time ago so I cannot give you the source) for the algorithm ... http://www.albany.edu/~msz03/nesug/combined.zip

You might look into some SAS functions that compare character strings and produce scores based on the degree of matching (e.g. SPEDIS, COMPGED)

Here's an example with some names (%NYSIIS uses the macro from the above link).  There are comparisons of the original names, "soundexed names", and "NYSIIS names" using SPEDIS and COMPGED ... the COMPGED comparisons of the "soundexed names" have consistently low scores.  Nothing is going to perfect short of a self-defined lookup table that does the exact name conversions that you specify.

ps  an old conference, but lots of good matching info ... how much do you want to read ???  ... Record Linkage Techniques - 1997

data x;

input (name1 name2) (:$20.);

%nysiis(name1,ncode1);            %nysiis(name2,ncode2)

scode1 = soundex(name1);           scode2 = soundex(name2);

spedis0 = spedis(name1,name2);     spedisn = spedis(ncode1,ncode2);

spediss = spedis(scode1,scode2);   compged0 = compged(name1,name2);

compgedn = compged(ncode1,ncode2); compgeds = compged(scode1,scode2);

datalines;

Mike Michael

John Johnson

Dave David

Ellie Ellen

Ellie Eleanor

Astound Astounding

Art Arthur

Dan Daniel

Lin Linlin

;

                                                                              c     c     c

                                                              s    s     s    o     o     o

                         n         n         s        s       p    p     p    m     m     m

   n          n          c         c         c        c       e    e     e    p     p     p

   a          a          o         o         o        o       d    d     d    g     g     g

   m          m          d         d         d        d       i    i     i    e     e     e

   e          e          e         e         e        e       s    s     s    d     d     d

   1          2          1         2         1        2       0    n     s    0     n     s

Mike      Michael      MAC     MACAL       M2      M24       62    33   25   230    20   10

John      Johnson      JAN     JANSAN      J5      J525      37    50   50    30    30   20

Dave      David        DAV     DAVAD       D1      D13       37    33   25   110    20   10

Ellie     Ellen        EL      ELAN        E4      E45       30    50   25   200    20   10

Ellie     Eleanor      EL      ELANAR      E4      E456      70   100   50   320    40   20

Astound   Astounding   ASTAN   ASTANDANG   A2353   A235352   21    40   20    30    40   20

Art       Arthur       AD      ARTAR       A63     A636      50   125   16    30   130   10

Dan       Daniel       DAN     DANAL       D5      D54       50    33   25    30    20   10

Lin       Linlin       LAN     LANLAN      L5      L545      50    50   50    30    30   20

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2523 views
  • 0 likes
  • 4 in conversation