BookmarkSubscribeRSS Feed
jjknknl
Fluorite | Level 6

I have a variable that contains free text inputted by users, and I need to know which entries contain a particular text string, allowing for slight misspellings (for example, allowing for the total number of insertions, deletions, or replacements to be less than N). The COMPLEV function only seems to compare two strings, and the prxmatch or index functions don't seem to allow for fuzzy matching like this (i.e., I would have to specify all the possible patterns i was willing to accept). What is the easiest way for me to accomplish this?

 

For example, say i have the following dataset s1

 

data s1;
length text $500;
input text &;
id = _n_;
datalines;
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium
;
run;

 

And say I want to search the "text" field to see which rows contain the string "edipiscing", allowing for slight spelling differences--for example, allowing for at most 1 character insertion, deletion, or replacement.

I could use prxmatch like this


proc sql;
select *
from s1
where prxmatch('/edipiscing/i', text)>0
;
quit;

 

But it would not find it in the first row, because there is one character replacement (in the first letter). I could do

 

proc sql;
select *
from s1
where prxmatch('/[a-z]dipiscing/i', text)>0
;
quit;

 

But i don't want to have to specify all possible patterns. Is there a SAS function that searches for the presence of a text string allowing for fuzzy matches?

1 REPLY 1
brantk
SAS Employee

Hi jjknknl,

This document may help if you are using SAS functions: https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/TASS/fogarasi_fuzzy_matching....

 

If you have SAS Data Quality, you can refer to this document. See PROC DQMATCH and the DQMATCH function: https://go.documentation.sas.com/?cdcId=dqcdc&cdcVersion=3.4&docsetId=dqclref&docsetTarget=titlepage...

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 681 views
  • 0 likes
  • 2 in conversation