06-15-2015 03:01 PM
I am looking into merging two databases bases on people's names. I'm not looking for exact matches since the two databases are from different organizations and one is a little bit older. One database or Master file is ~4.5 million entries with the other being significantly less. I've looked at a few techniques one being fuzzy matching with compged. Unfortunately the paper I was originally looking at was taking down. I have since being trying the code below that I got from a SAS blog (http://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/). I was originally also using the soundex portion of the code but for some reason I was getting a ridiculous amount of matches. I also did see a technique where you do a proc sql join that makes every combination possible. This probably wouldn't be possible for me because of the size of my databases.
So my question is what is the best technique for name matching with large databases? A link to a tutorial would be very much appreciated.
create table names.mastermatch_nppes as
select A.fullname as namemaster, B.full_name as nameMO
from names.master_nppes2 as A,
Names.missouri as B
06-16-2015 03:10 AM
If you need this on a regular basis you should have a closer look at the SAS/Dataflux offerings regarding Data Quality. In particular they already contain name standardizations and matchings via QKB CI SAS Quality Knowledge Base (QKB) which can save a lot of time/programming effort.
Depending on your installation you can acces them from your SAS session via PROC DQSCHEME: SAS(R) 9.4 Data Quality Server: Reference