BookmarkSubscribeRSS Feed
rogersaj
Obsidian | Level 7

I'm working with fuzzy data and trying to link prenatal care records within a dataset. See my original question post with the link below, where unfortunately none of the proposed solutions worked for me. My latest attempt is to use a working paper on "Linking records matching on at least m of n identifying keys" (http://www2.sas.com/proceedings/sugi27/p075-27.pdf). Unfortunately, I have yet to learn SQL so I'm sort of blindly submitting these statements. If someone could see what my error might be, I'd appreciate it. It creates a table with nothing in it.

 

https://communities.sas.com/t5/Base-SAS-Programming/Assigning-subject-identifiers-to-patients-with-m...

 

 

proc sql;

      create table pairs as

            select       datasetX.subjectid as subjectid1,

                        datasetY.subjectid as subjectid2

            from datasetX datasetY

            where       (      (datasetX.name1 eq datasetY.name1)

                        +       (datasetX.name2 eq datasetY.name2)

                        +       (datasetX.name3 eq datasetY.name3)

                        +       (datasetX.id eq datasetY.id)

                        +       (datasetX.dob eq datasetY.dob)

                        +      (datasetX.village eq datasetY.village) ) >= 2

            and      (datasetX.subjectid < datasetY.subjectid)

            order by subjectid1, subjectid2;

            quit;

 

Thanks so much.

3 REPLIES 3
ballardw
Super User

You might investigate http://www.cdc.gov/cancer/npcr/tools/registryplus/lp_features.htm Link Plus is a tool CDC uses to match data from different sources. It may save you a lot of work.

 

It does exact matches and then assigns probability of matches for non-exact matches. if you need to process very large numbers of records you can set a thershold for the probability to treat as 'assumed' matched in a batch mode.

 

Cost is free.

rogersaj
Obsidian | Level 7

Thanks, I'll try those today andhopefully one of the strategies will work. 

 

Just as an FYI, I had trouble with the Link King because it required some elements that my database didn't have, and wasn't able to use more than one "flex" variable (ie. not one of the core identifier variables that it felt like I should be using) to identify my records, which would have been helpful in my circumstance. However, even when I was successful in getting most of my data to work within its parameters, it mysteriously deleted about 1/2 of my records. I spent half a day trouble shooting that, to no success.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1341 views
  • 1 like
  • 3 in conversation