BookmarkSubscribeRSS Feed
rogersaj
Obsidian | Level 7

I'm working with a prenatal care dataset from a developing country. The data were abstracted from paper charts. Patient visits were recorded on a visit-by-visit basis, so there's no "patient file" per se. In this country, it's okay if names are spelled slightly differently, as long as it's in the ballpark phonetically. That means I now have the task of trying to link all these patient prenatal care records longitudinally while not having a consistent identifier. Here's what I do have:Slide1.JPG

For identifiers, I have their first names, middle names, last names, village, age (in years, no birthdate), last menstrual period date (LMP), expected delivery date (EDD), and parity. The problem is that no one identifier is consistently right. How do I sort these patients out and assign them a subject ID? 

 

Thanks so much for your advice.

4 REPLIES 4
Reeza
Super User

Fuzzy matching is a nightmare. 

Look at a tool from the link king. The SAS code is available. 

Reeza
Super User

Fuzzy matching is a nightmare. 

Look at a tool from the link king. The SAS code is available. 

ballardw
Super User

If you have access to the SAS text mining tools I think there are some additional tools there.

 

If you are restricted to base SAS then SAS has a function, SOUNDEX, which allows comparisons of sounds. However SAS specifically notes that non-English languages may not have good results.

 

If I were tackling this problem I would begin by identifying those individuals whose information appears exactly the same more than once.

I would assign them a base id. The I would use some of the other functions to attempt to match similar to those individuals. Such as start with the first and last name the same but the village is different. SPEDIS, COMPGED and COMPLEV give you several different approaches for finding "similar". Mark those identified as a match with the appropriate identifier value. Then look at those with last name and village identical and vary only by first name. Each step should have fewer unmatched records to look at.

Repeat until you have single name village combinations. Start comparing them against each other in a similar fashion.

 

I do have a project where I have to match names with additional information of gender and birth date. Luckily I don't have to look at more several hundred at a time.

 

 

KachiM
Rhodochrosite | Level 12

Some kind of approximations ....

 

You have 5 variables to compare to make a decision. If 3 of them have a match, select those records and save them in one data set. Those matching with 2 will goto another data set and matching with 1 will goto 3rd data set. Then you try some of the SAS fuzzy-match functions on the files saved, the work will be smaller.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 988 views
  • 0 likes
  • 4 in conversation