BookmarkSubscribeRSS Feed
hellohere
Pyrite | Level 9

How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!

 

Better with return of an index (like 0 to 100)...

10 REPLIES 10
hellohere
Pyrite | Level 9
Need Consider ABBREV, Last/First Reverse etc.
PaigeMiller
Diamond | Level 26

"Compare" to what? Could you please write a few more sentences describing the problem in more detail?

--
Paige Miller
hellohere
Pyrite | Level 9

Here is data set of Names(Last, Middle, First) of a university, datasetA

 

Here is also a dataset of Name(Last, Middle, First) of a class, datasetB.

 

Need to know what names from datasetB shows up in datsetA

PaigeMiller
Diamond | Level 26

Step 1. Sort both data sets by Last middle first (or maybe Last first middle)

Step 2. Do a data step merge to determine see which names are in both data sets. Example:

 

data want;
    merge A(in=ina) B(in=inb);
    by last middle first;
    if ina and inb;
run;

 

 

Of course, this ignores some possible problems in doing this merge, such as mismatched capitalization, punctuation mismatch, spelling mismatch, and so on. 

--
Paige Miller
hellohere
Pyrite | Level 9

Thanks, Surely this gives out an outcome. 

 

But the data quality is poor. Say, "Wilson" might show up as "Wilon"; First Name and Last Name swap etc... 

 

Need a function or macro to return matchness(0 to 100)

PaigeMiller
Diamond | Level 26

All of these things would be good to mention in your original post, which should be complete, it should have all the information we need. From now on, please provide all relevant information in your first post on a subject.

 

Please see this blog post for a method of doing this matching: https://blogs.sas.com/content/sgf/2021/09/21/fuzzy-matching/

--
Paige Miller
hellohere
Pyrite | Level 9
Thanks
PaigeMiller
Diamond | Level 26

Start a new thread using this is your subject line: PRX with multi-byte characters.

 

There are plenty of people here who are very knowledgeable about PRX (I am not one of them). Do NOT continue this thread by discussing PRX.

--
Paige Miller
ballardw
Super User

If you are looking at a probabilistic match I recommend CDC supplied Link Plus available at https://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm

The software is free to download and the documentation is there. You do export files to text but the software doesn't require the columns in the text files to have the same names as you can link a field named "LastName" to "FamilyName" or similar.

 

This will use name, and if you have other useful information like address, phone numbers and such, to provide a probability of match.

 

Otherwise you need to very clearly describe just what you mean by the  "0 to 100".

Functions like COMPGED, COMPLEV and SPEDIS will provide different measures of similarity as well.

 


@hellohere wrote:

How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!

 

Better with return of an index (like 0 to 100)...


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 821 views
  • 2 likes
  • 3 in conversation