Hi,
I have a problem about fuzzy find/kindex.
I have two columns, "All_members" and "Excellent". I want to generate a new column "Excellent_in" to indicate if "Excellent" is in "All_members". I usually do this by writing:
if kindex(All_members, Excellent)>0;
However, as it is shown, the name does not perfectly match here. I guess I need some fuzzy find/kindex function here. I will appreciate it very much if someone can help me with this. Thanks!
All_members | Excellent | Excellent_in |
Lee Han llp, Det tet co., 3M Inc. | 3M | 1 |
…… | …… | …… |
Vans co, nineng inc., leiwo | wyu | 0 |
Vans co, nineng inc., leiwo | leiwo inc | 1 |
kindex() should not be fuzzy.
I could not replicate from your example sadly.
data _null_;
A=kindex('Vans co, nineng inc., leiwo', 'leiwo inc');
put A=;
run;
A=0
Thanks, ChrisNZ!
Yes, I understand that kindex could not be fuzzy. I am asking if there are some functions that can be used to fuzzy matching.
> if there are some functions that can be used to fuzzy matching
Yes, but you have to define the rules yourself.
In the example you give, you could try to simply use index() on strings that have been cleaned: uppercase, no punctuation, no "Ltd" "Inc" "Limited" etc.
Functions that do fuzzy matching include the like and sounds like operators in SQL, the regular expressions matches like function prxmatch(), and the spelling distance functions like spedis() or compged(). It is up to you to define the criteria though. Each case is different.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.