How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!
Better with return of an index (like 0 to 100)...
"Compare" to what? Could you please write a few more sentences describing the problem in more detail?
Here is data set of Names(Last, Middle, First) of a university, datasetA
Here is also a dataset of Name(Last, Middle, First) of a class, datasetB.
Need to know what names from datasetB shows up in datsetA
Step 1. Sort both data sets by Last middle first (or maybe Last first middle)
Step 2. Do a data step merge to determine see which names are in both data sets. Example:
data want;
merge A(in=ina) B(in=inb);
by last middle first;
if ina and inb;
run;
Of course, this ignores some possible problems in doing this merge, such as mismatched capitalization, punctuation mismatch, spelling mismatch, and so on.
Thanks, Surely this gives out an outcome.
But the data quality is poor. Say, "Wilson" might show up as "Wilon"; First Name and Last Name swap etc...
Need a function or macro to return matchness(0 to 100)
All of these things would be good to mention in your original post, which should be complete, it should have all the information we need. From now on, please provide all relevant information in your first post on a subject.
Please see this blog post for a method of doing this matching: https://blogs.sas.com/content/sgf/2021/09/21/fuzzy-matching/
Start a new thread using this is your subject line: PRX with multi-byte characters.
There are plenty of people here who are very knowledgeable about PRX (I am not one of them). Do NOT continue this thread by discussing PRX.
If you are looking at a probabilistic match I recommend CDC supplied Link Plus available at https://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm
The software is free to download and the documentation is there. You do export files to text but the software doesn't require the columns in the text files to have the same names as you can link a field named "LastName" to "FamilyName" or similar.
This will use name, and if you have other useful information like address, phone numbers and such, to provide a probability of match.
Otherwise you need to very clearly describe just what you mean by the "0 to 100".
Functions like COMPGED, COMPLEV and SPEDIS will provide different measures of similarity as well.
@hellohere wrote:
How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!
Better with return of an index (like 0 to 100)...
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.