BookmarkSubscribeRSS Feed
rogersaj
Obsidian | Level 7

I'm working with fuzzy data and trying to link prenatal care records within a dataset. See my original question post with the link below, where unfortunately none of the proposed solutions worked for me. My latest attempt is to use a working paper on "Linking records matching on at least m of n identifying keys" (http://www2.sas.com/proceedings/sugi27/p075-27.pdf). Unfortunately, I have yet to learn SQL so I'm sort of blindly submitting these statements. If someone could see what my error might be, I'd appreciate it. It creates a table with nothing in it.

 

https://communities.sas.com/t5/Base-SAS-Programming/Assigning-subject-identifiers-to-patients-with-m...

 

 

proc sql;

      create table pairs as

            select       datasetX.subjectid as subjectid1,

                        datasetY.subjectid as subjectid2

            from datasetX datasetY

            where       (      (datasetX.name1 eq datasetY.name1)

                        +       (datasetX.name2 eq datasetY.name2)

                        +       (datasetX.name3 eq datasetY.name3)

                        +       (datasetX.id eq datasetY.id)

                        +       (datasetX.dob eq datasetY.dob)

                        +      (datasetX.village eq datasetY.village) ) >= 2

            and      (datasetX.subjectid < datasetY.subjectid)

            order by subjectid1, subjectid2;

            quit;

 

Thanks so much.

3 REPLIES 3
ballardw
Super User

You might investigate http://www.cdc.gov/cancer/npcr/tools/registryplus/lp_features.htm Link Plus is a tool CDC uses to match data from different sources. It may save you a lot of work.

 

It does exact matches and then assigns probability of matches for non-exact matches. if you need to process very large numbers of records you can set a thershold for the probability to treat as 'assumed' matched in a batch mode.

 

Cost is free.

rogersaj
Obsidian | Level 7

Thanks, I'll try those today andhopefully one of the strategies will work. 

 

Just as an FYI, I had trouble with the Link King because it required some elements that my database didn't have, and wasn't able to use more than one "flex" variable (ie. not one of the core identifier variables that it felt like I should be using) to identify my records, which would have been helpful in my circumstance. However, even when I was successful in getting most of my data to work within its parameters, it mysteriously deleted about 1/2 of my records. I spent half a day trouble shooting that, to no success.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 629 views
  • 1 like
  • 3 in conversation