- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have attached an excel file. I want to match group A with group B using names. The different colors show the matching names. However, the names don't exactly match in two different groups. One of the ways we can match is by using the last name. But the last name is not always used similarly in the two groups, for example, John Holmes III. Also, in my large dataset, two different persons in the same firm and in the same year may have the same last name. How will I handle all these issues and how will I perfectly match one dataset with another dataset using names? Could you please help me?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please post SAS datasets in usable form (working data steps with datalines).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perfect matching of free-text names is not possible. Only unique and totally accurate name keys like Customer_ID or Employee_ID will result in perfect matching.
The best you can do is to try and standardise the first and last names, if those are the only two columns available to you and try and get as high a match rate as possible. Also is it possible that two different people could share the same name?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It will never be an exact match when the rules are not perfectly defined. However, for most purposes you can do a good job with fuzzy matching.
There are many axamples online .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> two different persons in the same firm and in the same year may have the same last name. How will I handle all these issues
Some questions such as these you need to decide what's teh best for you.
For fuzzy string matching, there are several SAS functions such as SPEDIS or COMPGED.