- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!
Better with return of an index (like 0 to 100)...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"Compare" to what? Could you please write a few more sentences describing the problem in more detail?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is data set of Names(Last, Middle, First) of a university, datasetA
Here is also a dataset of Name(Last, Middle, First) of a class, datasetB.
Need to know what names from datasetB shows up in datsetA
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Step 1. Sort both data sets by Last middle first (or maybe Last first middle)
Step 2. Do a data step merge to determine see which names are in both data sets. Example:
data want;
merge A(in=ina) B(in=inb);
by last middle first;
if ina and inb;
run;
Of course, this ignores some possible problems in doing this merge, such as mismatched capitalization, punctuation mismatch, spelling mismatch, and so on.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Surely this gives out an outcome.
But the data quality is poor. Say, "Wilson" might show up as "Wilon"; First Name and Last Name swap etc...
Need a function or macro to return matchness(0 to 100)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All of these things would be good to mention in your original post, which should be complete, it should have all the information we need. From now on, please provide all relevant information in your first post on a subject.
Please see this blog post for a method of doing this matching: https://blogs.sas.com/content/sgf/2021/09/21/fuzzy-matching/
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Start a new thread using this is your subject line: PRX with multi-byte characters.
There are plenty of people here who are very knowledgeable about PRX (I am not one of them). Do NOT continue this thread by discussing PRX.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are looking at a probabilistic match I recommend CDC supplied Link Plus available at https://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm
The software is free to download and the documentation is there. You do export files to text but the software doesn't require the columns in the text files to have the same names as you can link a field named "LastName" to "FamilyName" or similar.
This will use name, and if you have other useful information like address, phone numbers and such, to provide a probability of match.
Otherwise you need to very clearly describe just what you mean by the "0 to 100".
Functions like COMPGED, COMPLEV and SPEDIS will provide different measures of similarity as well.
@hellohere wrote:
How to compare Name (LAST, MIDDLE, FIRST NAME) with a single function/MACRO?!
Better with return of an index (like 0 to 100)...