<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Understanding the Fuzzy Matching Techniques Behind Matchcode Generation in SAS Data Quality in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/954244#M21026</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have been researching various operations that can be performed on textual data in SAS Data Quality and am particularly interested in the Matching operation. While going through the DataFlux documentation, I have gained a fair understanding of concepts like match definitions, schemes, and chop tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I am curious about the fuzzy matching techniques used to generate matchcodes. Specifically, I would like to understand:&lt;/P&gt;&lt;P&gt;What fuzzy matching techniques or algorithms are used in SAS Data Quality for matchcode generation?&lt;BR /&gt;Are these techniques based on phonetic algorithms (e.g., Soundex) or string similarity measures (e.g., Levenshtein distance, Jaro-Winkler)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I understand the what of match definitions and matchcodes, but now I am keen to dive deeper into the how. Any guidance or references would be greatly appreciated!&lt;/P&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
    <pubDate>Fri, 20 Dec 2024 06:30:22 GMT</pubDate>
    <dc:creator>saunvida</dc:creator>
    <dc:date>2024-12-20T06:30:22Z</dc:date>
    <item>
      <title>Understanding the Fuzzy Matching Techniques Behind Matchcode Generation in SAS Data Quality</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/954244#M21026</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have been researching various operations that can be performed on textual data in SAS Data Quality and am particularly interested in the Matching operation. While going through the DataFlux documentation, I have gained a fair understanding of concepts like match definitions, schemes, and chop tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I am curious about the fuzzy matching techniques used to generate matchcodes. Specifically, I would like to understand:&lt;/P&gt;&lt;P&gt;What fuzzy matching techniques or algorithms are used in SAS Data Quality for matchcode generation?&lt;BR /&gt;Are these techniques based on phonetic algorithms (e.g., Soundex) or string similarity measures (e.g., Levenshtein distance, Jaro-Winkler)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I understand the what of match definitions and matchcodes, but now I am keen to dive deeper into the how. Any guidance or references would be greatly appreciated!&lt;/P&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2024 06:30:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/954244#M21026</guid>
      <dc:creator>saunvida</dc:creator>
      <dc:date>2024-12-20T06:30:22Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding the Fuzzy Matching Techniques Behind Matchcode Generation in SAS Data Quality</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/955266#M21028</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope you are still interested in this topic.&lt;/P&gt;
&lt;P&gt;The QKB is used in generating Fuzzy Codes. It is a collection of file definitions, schemas, chop tables, phonetic libraries, regex Libraries, vocabularies, and grammars. These files can be edited in the Data Flux Management Studio Application. If out-of-the-box rules don't meet your organisation's needs then you can add or edit files. For me, I struggled with vocabulary and grammar files.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Match code generation follows a series of steps, using the rules from those files to tidy the string at each step to remove noise, standardize, normalize, phonetic reduction, and create a Matchcode Layout. It is a lot more than just the Soundex function.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="QKB Definition Steps" style="width: 201px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/103499i53458CB9875C2046/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2025-01-07 102049.png" alt="QKB Definition Steps" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;QKB Definition Steps&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The sensitivity defines the number of characters used to create a fuzzy code.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Sensitivity" style="width: 694px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/103500i9BE78595E2C4544D/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2025-01-07 103125.png" alt="Sensitivity" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Sensitivity&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Finally, MatchCode is an unencoded string, that is converted to is encoded Fuzzy String. The encoding logic is hidden but the node generates it based on characters.&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Encoding" style="width: 665px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/103501iC1D6DF4F6CF34ABF/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2025-01-07 103323.png" alt="Encoding" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Encoding&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;For each value in the column data, these steps are followed to generate match codes. For a data value, Matchcodes do not change unless the QKB version or rules are edited. Hope this gives some understanding of the generation of Match Codes.&lt;BR /&gt;&lt;BR /&gt;If you are interested in learning and editing those files then there are QKB courses on SAS Learning,&amp;nbsp;&lt;BR /&gt;Wish you luck.&lt;/P&gt;
&lt;P&gt;Rama&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2025 00:51:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/955266#M21028</guid>
      <dc:creator>Rama_V</dc:creator>
      <dc:date>2025-01-07T00:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding the Fuzzy Matching Techniques Behind Matchcode Generation in SAS Data Quality</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/955269#M21029</link>
      <description>&lt;P&gt;It's also worth understanding that SAS Data Quality's QKBs are country-specific. If you are trying to match addresses say in India then you need the Indian QKB which is customised to deal with the unique Indian address formatting.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2025 02:10:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Understanding-the-Fuzzy-Matching-Techniques-Behind-Matchcode/m-p/955269#M21029</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2025-01-07T02:10:36Z</dc:date>
    </item>
  </channel>
</rss>

