08-03-2016 05:57 PM
I'm working with fuzzy data and trying to link prenatal care records within a dataset. See my original question post with the link below, where unfortunately none of the proposed solutions worked for me. My latest attempt is to use a working paper on "Linking records matching on at least m of n identifying keys" (http://www2.sas.com/proceedings/sugi27/p075-27.pdf). Unfortunately, I have yet to learn SQL so I'm sort of blindly submitting these statements. If someone could see what my error might be, I'd appreciate it. It creates a table with nothing in it.
create table pairs as
select datasetX.subjectid as subjectid1,
datasetY.subjectid as subjectid2
from datasetX datasetY
where ( (datasetX.name1 eq datasetY.name1)
+ (datasetX.name2 eq datasetY.name2)
+ (datasetX.name3 eq datasetY.name3)
+ (datasetX.id eq datasetY.id)
+ (datasetX.dob eq datasetY.dob)
+ (datasetX.village eq datasetY.village) ) >= 2
and (datasetX.subjectid < datasetY.subjectid)
order by subjectid1, subjectid2;
Thanks so much.
08-03-2016 06:15 PM
You might investigate http://www.cdc.gov/cancer/npcr/tools/registryplus/lp_features.htm Link Plus is a tool CDC uses to match data from different sources. It may save you a lot of work.
It does exact matches and then assigns probability of matches for non-exact matches. if you need to process very large numbers of records you can set a thershold for the probability to treat as 'assumed' matched in a batch mode.
Cost is free.
08-04-2016 10:24 AM
Thanks, I'll try those today andhopefully one of the strategies will work.
Just as an FYI, I had trouble with the Link King because it required some elements that my database didn't have, and wasn't able to use more than one "flex" variable (ie. not one of the core identifier variables that it felt like I should be using) to identify my records, which would have been helpful in my circumstance. However, even when I was successful in getting most of my data to work within its parameters, it mysteriously deleted about 1/2 of my records. I spent half a day trouble shooting that, to no success.