Hello,
I have this dataset: I am presenting this table with fname and lname, but both variables should be perfect match if we are going to match it using fuzzy matching because they are from same people with different version.
Obs | fname | lname |
1 | Morales De Rodriguez | Morales-Rodriguez |
2 | Morales De Rodriguez | Morales – Rodriguez |
3 | Morales De Rodriguez | Morales Rodriguez |
4 | Morales De Rodriguez | MoralesRodriguez |
5 | Morales De Rodriguez | Morales – De – Rodriguez |
6 | Morales-Rodriguez | Morales De Rodriguez |
7 | Morales Rodriguez | MoralesDeRodriguez |
I am using this code to match it.
data final2;
set work.final;
delims = ' ,.!–-';
fname2= compress(fname, delims);
lname2 =compress(lname, delims);
score_compged=compged(fname2, lname2, 'INL');
score2_complev=complev(fname2, lname2, 'INL');
run;
proc print data=final2;
run;
ods rtf close;
Do you have any better code than this one?
Thanks,
Bikash
but both variables should be perfect match
How are you defining a perfect match? A perfect match does not use fuzzy matching at all or COMPGED.
I'll see your thousand records and raise you 12,000.
Having to see if any of roughly 13,000 in one data set may have been in another data system where names are stored in very different forms. One had the first, last, middle names, things like Junior or II in a single field without any fixed order. And some had two last names related to parents.
@bikashten wrote:
That's I am doing right now, but I just posted it if there are any alternative better way to do it. It's not fun to do it manually over thousand records. Thanks, Bikash
Agreed, but you can't always program your way out of bad data and it's better to fix this at the sources somehow,using a number to identify companies instead of names is a starter, having a cleaned data base, having a verification step as people enter data.
Trying to clean up the mess afterwards is always more work.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.