Hi,
I have attached an excel file. I want to match group A with group B using names. The different colors show the matching names. However, the names don't exactly match in two different groups. One of the ways we can match is by using the last name. But the last name is not always used similarly in the two groups, for example, John Holmes III. Also, in my large dataset, two different persons in the same firm and in the same year may have the same last name. How will I handle all these issues and how will I perfectly match one dataset with another dataset using names? Could you please help me?
Please post SAS datasets in usable form (working data steps with datalines).
Perfect matching of free-text names is not possible. Only unique and totally accurate name keys like Customer_ID or Employee_ID will result in perfect matching.
The best you can do is to try and standardise the first and last names, if those are the only two columns available to you and try and get as high a match rate as possible. Also is it possible that two different people could share the same name?
It will never be an exact match when the rules are not perfectly defined. However, for most purposes you can do a good job with fuzzy matching.
There are many axamples online .
> two different persons in the same firm and in the same year may have the same last name. How will I handle all these issues
Some questions such as these you need to decide what's teh best for you.
For fuzzy string matching, there are several SAS functions such as SPEDIS or COMPGED.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.