Hi,
I have a problem about fuzzy find/kindex.
I have two columns, "All_members" and "Excellent". I want to generate a new column "Excellent_in" to indicate if "Excellent" is in "All_members". I usually do this by writing:
if kindex(All_members, Excellent)>0;
However, as it is shown, the name does not perfectly match here. I guess I need some fuzzy find/kindex function here. I will appreciate it very much if someone can help me with this. Thanks!
All_members | Excellent | Excellent_in |
Lee Han llp, Det tet co., 3M Inc. | 3M | 1 |
…… | …… | …… |
Vans co, nineng inc., leiwo | wyu | 0 |
Vans co, nineng inc., leiwo | leiwo inc | 1 |
kindex() should not be fuzzy.
I could not replicate from your example sadly.
data _null_;
A=kindex('Vans co, nineng inc., leiwo', 'leiwo inc');
put A=;
run;
A=0
Thanks, ChrisNZ!
Yes, I understand that kindex could not be fuzzy. I am asking if there are some functions that can be used to fuzzy matching.
> if there are some functions that can be used to fuzzy matching
Yes, but you have to define the rules yourself.
In the example you give, you could try to simply use index() on strings that have been cleaned: uppercase, no punctuation, no "Ltd" "Inc" "Limited" etc.
Functions that do fuzzy matching include the like and sounds like operators in SQL, the regular expressions matches like function prxmatch(), and the spelling distance functions like spedis() or compged(). It is up to you to define the criteria though. Each case is different.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.