09-05-2012 10:17 AM
I am looking for matching function that matches given person nick name to real name. For Ex, If i supply DAVE it should match with DAVID , if i supply JOHN it should match with JOHNSON . I tried Sound like (=*) operator & Soundex function also, but it didn't give desire output.
09-05-2012 10:52 AM
If you're not happy with the sounds like operator/soundex, I don't see any function that will do this kind of "magical" match.
Maybe perl regular expressions will get you nearer, but I'm sure it will go well farther than just one magical simple expression to do the match.
Unless you are coding this with a full set of good rules or dictionary, can't see how to do this.
Cheers from Portugal.
Daniel Santos @ www.cgd.pt
09-05-2012 11:16 AM
There's plenty that has been done in this areas. Try a google search on nickname to names.
You will find that there is not a one-to-one match. For example, Ellie usually matches to Ellen, but could also match to Eleanor, Elisa (and spelling variations such as Elissa). You may have to prioritize from a list of possible matches.
09-05-2012 11:43 AM
Hi ... as ASTOUNDING points out, there are a lot of postings about name matching. As for SOUNDEX, it might be handy to know exactly what you are comparing with that algorithm. Also, there's a similar technique called NYSIIS ...
and SAS code (that I found on the web a long time ago so I cannot give you the source) for the algorithm ... http://www.albany.edu/~msz03/nesug/combined.zip
You might look into some SAS functions that compare character strings and produce scores based on the degree of matching (e.g. SPEDIS, COMPGED)
Here's an example with some names (%NYSIIS uses the macro from the above link). There are comparisons of the original names, "soundexed names", and "NYSIIS names" using SPEDIS and COMPGED ... the COMPGED comparisons of the "soundexed names" have consistently low scores. Nothing is going to perfect short of a self-defined lookup table that does the exact name conversions that you specify.
ps an old conference, but lots of good matching info ... how much do you want to read ??? ... Record Linkage Techniques - 1997
input (name1 name2) (:$20.);
scode1 = soundex(name1); scode2 = soundex(name2);
spedis0 = spedis(name1,name2); spedisn = spedis(ncode1,ncode2);
spediss = spedis(scode1,scode2); compged0 = compged(name1,name2);
compgedn = compged(ncode1,ncode2); compgeds = compged(scode1,scode2);
c c c
s s s o o o
n n s s p p p m m m
n n c c c c e e e p p p
a a o o o o d d d g g g
m m d d d d i i i e e e
e e e e e e s s s d d d
1 2 1 2 1 2 0 n s 0 n s
Mike Michael MAC MACAL M2 M24 62 33 25 230 20 10
John Johnson JAN JANSAN J5 J525 37 50 50 30 30 20
Dave David DAV DAVAD D1 D13 37 33 25 110 20 10
Ellie Ellen EL ELAN E4 E45 30 50 25 200 20 10
Ellie Eleanor EL ELANAR E4 E456 70 100 50 320 40 20
Astound Astounding ASTAN ASTANDANG A2353 A235352 21 40 20 30 40 20
Art Arthur AD ARTAR A63 A636 50 125 16 30 130 10
Dan Daniel DAN DANAL D5 D54 50 33 25 30 20 10
Lin Linlin LAN LANLAN L5 L545 50 50 50 30 30 20