BookmarkSubscribeRSS Feed
rogersaj
Obsidian | Level 7

I'm working with a prenatal care dataset from a developing country. The data were abstracted from paper charts. Patient visits were recorded on a visit-by-visit basis, so there's no "patient file" per se. In this country, it's okay if names are spelled slightly differently, as long as it's in the ballpark phonetically. That means I now have the task of trying to link all these patient prenatal care records longitudinally while not having a consistent identifier. Here's what I do have:Slide1.JPG

For identifiers, I have their first names, middle names, last names, village, age (in years, no birthdate), last menstrual period date (LMP), expected delivery date (EDD), and parity. The problem is that no one identifier is consistently right. How do I sort these patients out and assign them a subject ID? 

 

Thanks so much for your advice.

4 REPLIES 4
Reeza
Super User

Fuzzy matching is a nightmare. 

Look at a tool from the link king. The SAS code is available. 

Reeza
Super User

Fuzzy matching is a nightmare. 

Look at a tool from the link king. The SAS code is available. 

ballardw
Super User

If you have access to the SAS text mining tools I think there are some additional tools there.

 

If you are restricted to base SAS then SAS has a function, SOUNDEX, which allows comparisons of sounds. However SAS specifically notes that non-English languages may not have good results.

 

If I were tackling this problem I would begin by identifying those individuals whose information appears exactly the same more than once.

I would assign them a base id. The I would use some of the other functions to attempt to match similar to those individuals. Such as start with the first and last name the same but the village is different. SPEDIS, COMPGED and COMPLEV give you several different approaches for finding "similar". Mark those identified as a match with the appropriate identifier value. Then look at those with last name and village identical and vary only by first name. Each step should have fewer unmatched records to look at.

Repeat until you have single name village combinations. Start comparing them against each other in a similar fashion.

 

I do have a project where I have to match names with additional information of gender and birth date. Luckily I don't have to look at more several hundred at a time.

 

 

KachiM
Rhodochrosite | Level 12

Some kind of approximations ....

 

You have 5 variables to compare to make a decision. If 3 of them have a match, select those records and save them in one data set. Those matching with 2 will goto another data set and matching with 1 will goto 3rd data set. Then you try some of the SAS fuzzy-match functions on the files saved, the work will be smaller.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 963 views
  • 0 likes
  • 4 in conversation