BookmarkSubscribeRSS Feed
jjknknl
Fluorite | Level 6

I have a variable that contains free text inputted by users, and I need to know which entries contain a particular text string, allowing for slight misspellings (for example, allowing for the total number of insertions, deletions, or replacements to be less than N). The COMPLEV function only seems to compare two strings, and the prxmatch or index functions don't seem to allow for fuzzy matching like this (i.e., I would have to specify all the possible patterns i was willing to accept). What is the easiest way for me to accomplish this?

 

For example, say i have the following dataset s1

 

data s1;
length text $500;
input text &;
id = _n_;
datalines;
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium
;
run;

 

And say I want to search the "text" field to see which rows contain the string "edipiscing", allowing for slight spelling differences--for example, allowing for at most 1 character insertion, deletion, or replacement.

I could use prxmatch like this


proc sql;
select *
from s1
where prxmatch('/edipiscing/i', text)>0
;
quit;

 

But it would not find it in the first row, because there is one character replacement (in the first letter). I could do

 

proc sql;
select *
from s1
where prxmatch('/[a-z]dipiscing/i', text)>0
;
quit;

 

But i don't want to have to specify all possible patterns. Is there a SAS function that searches for the presence of a text string allowing for fuzzy matches?

1 REPLY 1
brantk
SAS Employee

Hi jjknknl,

This document may help if you are using SAS functions: https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/TASS/fogarasi_fuzzy_matching....

 

If you have SAS Data Quality, you can refer to this document. See PROC DQMATCH and the DQMATCH function: https://go.documentation.sas.com/?cdcId=dqcdc&cdcVersion=3.4&docsetId=dqclref&docsetTarget=titlepage...

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 566 views
  • 0 likes
  • 2 in conversation