<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicate record by similarity score/closer value in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489411#M15262</link>
    <description>&lt;P&gt;This question is a bit fuzzy in its nature because how similar should two strings be for them to be considered equal? Ie. when are two strings in two different observations similar enough to be considered duplicates and therefore omitted?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Two functions to get you going are the &lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206137.htm" target="_self"&gt;COMPLEV &lt;/A&gt;and &lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm" target="_self"&gt;COMPGED &lt;/A&gt;Functions. Both functions take two strings as input and return a number, which represents the 'distance' between two strings.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 23 Aug 2018 21:01:50 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2018-08-23T21:01:50Z</dc:date>
    <item>
      <title>Duplicate record by similarity score/closer value</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489374#M15261</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;
&lt;P&gt;I am looking for a code that will allow me to find duplicate based on exact as well as similar value. I know for exact value we can use NODUP, NODUPKEYS and NOUNIQUEKEYS. But can anyone tell me what function I can use to identify duplicate records based on similarity value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;Following two records should be considered as duplicate though Last name and Address are not same (but they are similar).&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;First Name&amp;nbsp; &amp;nbsp; &amp;nbsp; Last Name&amp;nbsp; Address&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;John&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Murruy&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1 New York St.&lt;/P&gt;
&lt;P&gt;John&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Murray&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1 New York Street&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Aug 2018 19:22:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489374#M15261</guid>
      <dc:creator>mlogan</dc:creator>
      <dc:date>2018-08-23T19:22:17Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate record by similarity score/closer value</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489411#M15262</link>
      <description>&lt;P&gt;This question is a bit fuzzy in its nature because how similar should two strings be for them to be considered equal? Ie. when are two strings in two different observations similar enough to be considered duplicates and therefore omitted?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Two functions to get you going are the &lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206137.htm" target="_self"&gt;COMPLEV &lt;/A&gt;and &lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm" target="_self"&gt;COMPGED &lt;/A&gt;Functions. Both functions take two strings as input and return a number, which represents the 'distance' between two strings.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Aug 2018 21:01:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489411#M15262</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2018-08-23T21:01:50Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate record by similarity score/closer value</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489545#M15263</link>
      <description>If you have a license for the SAS Data Quality procedures I'd look into match code generation to solve this type of problem.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#n1597gcbsehaokn1j500v6ius99p.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#n1597gcbsehaokn1j500v6ius99p.htm&lt;/A&gt;</description>
      <pubDate>Fri, 24 Aug 2018 10:49:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Duplicate-record-by-similarity-score-closer-value/m-p/489545#M15263</guid>
      <dc:creator>SimonDawson</dc:creator>
      <dc:date>2018-08-24T10:49:22Z</dc:date>
    </item>
  </channel>
</rss>

