I have a variable that contains free text inputted by users, and I need to know which entries contain a particular text string, allowing for slight misspellings (for example, allowing for the total number of insertions, deletions, or replacements to be less than N). The COMPLEV function only seems to compare two strings, and the prxmatch or index functions don't seem to allow for fuzzy matching like this (i.e., I would have to specify all the possible patterns i was willing to accept). What is the easiest way for me to accomplish this?
For example, say i have the following dataset s1
data s1;
length text $500;
input text &;
id = _n_;
datalines;
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium
;
run;
And say I want to search the "text" field to see which rows contain the string "edipiscing", allowing for slight spelling differences--for example, allowing for at most 1 character insertion, deletion, or replacement.
I could use prxmatch like this
proc sql;
select *
from s1
where prxmatch('/edipiscing/i', text)>0
;
quit;
But it would not find it in the first row, because there is one character replacement (in the first letter). I could do
proc sql;
select *
from s1
where prxmatch('/[a-z]dipiscing/i', text)>0
;
quit;
But i don't want to have to specify all possible patterns. Is there a SAS function that searches for the presence of a text string allowing for fuzzy matches?