BookmarkSubscribeRSS Feed
hellohere
Pyrite | Level 9

How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!

 

Better with return of an index (like 0 to 100)...

10 REPLIES 10
hellohere
Pyrite | Level 9
Need Consider ABBREV, Last/First Reverse etc.
PaigeMiller
Diamond | Level 26

"Compare" to what? Could you please write a few more sentences describing the problem in more detail?

--
Paige Miller
hellohere
Pyrite | Level 9

Here is data set of Names(Last, Middle, First) of a university, datasetA

 

Here is also a dataset of Name(Last, Middle, First) of a class, datasetB.

 

Need to know what names from datasetB shows up in datsetA

PaigeMiller
Diamond | Level 26

Step 1. Sort both data sets by Last middle first (or maybe Last first middle)

Step 2. Do a data step merge to determine see which names are in both data sets. Example:

 

data want;
    merge A(in=ina) B(in=inb);
    by last middle first;
    if ina and inb;
run;

 

 

Of course, this ignores some possible problems in doing this merge, such as mismatched capitalization, punctuation mismatch, spelling mismatch, and so on. 

--
Paige Miller
hellohere
Pyrite | Level 9

Thanks, Surely this gives out an outcome. 

 

But the data quality is poor. Say, "Wilson" might show up as "Wilon"; First Name and Last Name swap etc... 

 

Need a function or macro to return matchness(0 to 100)

PaigeMiller
Diamond | Level 26

All of these things would be good to mention in your original post, which should be complete, it should have all the information we need. From now on, please provide all relevant information in your first post on a subject.

 

Please see this blog post for a method of doing this matching: https://blogs.sas.com/content/sgf/2021/09/21/fuzzy-matching/

--
Paige Miller
hellohere
Pyrite | Level 9
Thanks
PaigeMiller
Diamond | Level 26

Start a new thread using this is your subject line: PRX with multi-byte characters.

 

There are plenty of people here who are very knowledgeable about PRX (I am not one of them). Do NOT continue this thread by discussing PRX.

--
Paige Miller
ballardw
Super User

If you are looking at a probabilistic match I recommend CDC supplied Link Plus available at https://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm

The software is free to download and the documentation is there. You do export files to text but the software doesn't require the columns in the text files to have the same names as you can link a field named "LastName" to "FamilyName" or similar.

 

This will use name, and if you have other useful information like address, phone numbers and such, to provide a probability of match.

 

Otherwise you need to very clearly describe just what you mean by the  "0 to 100".

Functions like COMPGED, COMPLEV and SPEDIS will provide different measures of similarity as well.

 


@hellohere wrote:

How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!

 

Better with return of an index (like 0 to 100)...


 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2796 views
  • 2 likes
  • 3 in conversation