<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How perform a fuzzy prxmatch or fuzzy index search in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/How-perform-a-fuzzy-prxmatch-or-fuzzy-index-search/m-p/624350#M20088</link>
    <description>&lt;P&gt;I have a variable that contains free text inputted by users, and I need to know which entries contain a particular text string, allowing for slight misspellings (for example, allowing for the total number of insertions, deletions, or replacements to be less than N). The COMPLEV function only seems to compare two strings, and the prxmatch or index functions don't seem to allow for fuzzy matching like this (i.e., I would have to specify all the possible patterns i was willing to accept). What is the easiest way for me to accomplish this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, say i have the following dataset s1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data s1;&lt;BR /&gt;length text $500;&lt;BR /&gt;input text &amp;amp;;&lt;BR /&gt;id = _n_;&lt;BR /&gt;datalines;&lt;BR /&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.&lt;BR /&gt;Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And say I want to search the "text" field to see which rows contain the string "edipiscing", allowing for slight spelling differences--for example, allowing for at most 1 character insertion, deletion, or replacement.&lt;/P&gt;&lt;P&gt;I could use prxmatch like this&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;select *&lt;BR /&gt;from s1&lt;BR /&gt;where prxmatch('/edipiscing/i', text)&amp;gt;0&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But it would not find it in the first row, because there is one character replacement (in the first letter). I could do&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;select *&lt;BR /&gt;from s1&lt;BR /&gt;where prxmatch('/[a-z]dipiscing/i', text)&amp;gt;0&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But i don't want to have to specify all possible patterns. Is there a SAS function that searches for the presence of a text string allowing for fuzzy matches?&lt;/P&gt;</description>
    <pubDate>Wed, 12 Feb 2020 21:19:04 GMT</pubDate>
    <dc:creator>jjknknl</dc:creator>
    <dc:date>2020-02-12T21:19:04Z</dc:date>
    <item>
      <title>How perform a fuzzy prxmatch or fuzzy index search</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-perform-a-fuzzy-prxmatch-or-fuzzy-index-search/m-p/624350#M20088</link>
      <description>&lt;P&gt;I have a variable that contains free text inputted by users, and I need to know which entries contain a particular text string, allowing for slight misspellings (for example, allowing for the total number of insertions, deletions, or replacements to be less than N). The COMPLEV function only seems to compare two strings, and the prxmatch or index functions don't seem to allow for fuzzy matching like this (i.e., I would have to specify all the possible patterns i was willing to accept). What is the easiest way for me to accomplish this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, say i have the following dataset s1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data s1;&lt;BR /&gt;length text $500;&lt;BR /&gt;input text &amp;amp;;&lt;BR /&gt;id = _n_;&lt;BR /&gt;datalines;&lt;BR /&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.&lt;BR /&gt;Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And say I want to search the "text" field to see which rows contain the string "edipiscing", allowing for slight spelling differences--for example, allowing for at most 1 character insertion, deletion, or replacement.&lt;/P&gt;&lt;P&gt;I could use prxmatch like this&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;select *&lt;BR /&gt;from s1&lt;BR /&gt;where prxmatch('/edipiscing/i', text)&amp;gt;0&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But it would not find it in the first row, because there is one character replacement (in the first letter). I could do&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;select *&lt;BR /&gt;from s1&lt;BR /&gt;where prxmatch('/[a-z]dipiscing/i', text)&amp;gt;0&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But i don't want to have to specify all possible patterns. Is there a SAS function that searches for the presence of a text string allowing for fuzzy matches?&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 21:19:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-perform-a-fuzzy-prxmatch-or-fuzzy-index-search/m-p/624350#M20088</guid>
      <dc:creator>jjknknl</dc:creator>
      <dc:date>2020-02-12T21:19:04Z</dc:date>
    </item>
    <item>
      <title>Re: How perform a fuzzy prxmatch or fuzzy index search</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-perform-a-fuzzy-prxmatch-or-fuzzy-index-search/m-p/624355#M20089</link>
      <description>&lt;P&gt;Hi jjknknl,&lt;/P&gt;
&lt;P&gt;This document may help if you are using SAS functions: &lt;A href="https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/TASS/fogarasi_fuzzy_matching.pdf" target="_blank" rel="noopener"&gt;https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/TASS/fogarasi_fuzzy_matching.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have SAS Data Quality, you can refer to this document. See PROC DQMATCH and the DQMATCH function: &lt;A href="https://go.documentation.sas.com/?cdcId=dqcdc&amp;amp;cdcVersion=3.4&amp;amp;docsetId=dqclref&amp;amp;docsetTarget=titlepage.htm&amp;amp;locale=en" target="_blank"&gt;https://go.documentation.sas.com/?cdcId=dqcdc&amp;amp;cdcVersion=3.4&amp;amp;docsetId=dqclref&amp;amp;docsetTarget=titlepage.htm&amp;amp;locale=en&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 22:18:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-perform-a-fuzzy-prxmatch-or-fuzzy-index-search/m-p/624355#M20089</guid>
      <dc:creator>brantk</dc:creator>
      <dc:date>2020-02-12T22:18:55Z</dc:date>
    </item>
  </channel>
</rss>

