09-08-2011 09:25 AM
Kinda new in the SAS world so i would gladly take your advice. My issue is as following: I dont possess text miner and i am trying to handle a case with text via SAS Base.
So what i want to figure out is to recognize if in a group of rows (5,7,3) the names match, meaning
if i have for example Alex Smith in row 1 and Smith Alex in row 2 that the program will figure that it is the same name.
In addition there can be a rows where the name is Smith Alexander or Smith Alex which is the same name and i would like SAS to recognize that.
That means that if two rows have at least 2 words in common (the total for each row would be lets say 3 words) i would like to find a command so that SAS can consider them the same and therefore place them in the same group.
I hope it makes sense and hope in addition that any advice can be found here.
Thnx in advance
09-08-2011 09:54 AM
You can find quite a bit on the web if you search for "fuzzy match". A couple of nice examples, with complete code, can be found at:
SAS has a number of similarity check functions (e.g., complev, compged, compare, compcost, soundex, spedis and regular expressions). Look at all of them to see which might work best for you.
Of course, if they could be available to you, text miner and dataflux could save you a lot of development costs and effort.