<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fuzzy logic to locate duplicates in a table in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547268#M151620</link>
    <description>&lt;P&gt;Before I explain: Did it work for you? And does it work for your actual data? &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for marking my response as an accepted solution. However, can you please mark the response with the actual code as the solution?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 29 Mar 2019 17:05:58 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2019-03-29T17:05:58Z</dc:date>
    <item>
      <title>Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/545904#M151094</link>
      <description>&lt;P&gt;Hi Experts,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a table name one with column name person_name having almost 1 million records.&lt;/P&gt;&lt;P&gt;sample records are as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Person_name&lt;/P&gt;&lt;P&gt;Michael&lt;/P&gt;&lt;P&gt;Michel&lt;/P&gt;&lt;P&gt;kurt&lt;/P&gt;&lt;P&gt;kirt&lt;/P&gt;&lt;P&gt;Michaell&lt;/P&gt;&lt;P&gt;Benjamin&lt;/P&gt;&lt;P&gt;Mich&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to check each row and group them together such that edit distance among group is not more than 2.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;so My final dataset would be like&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Person_name&amp;nbsp; &amp;nbsp; Group_name&lt;/P&gt;&lt;P&gt;Michael&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;Michel&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;kurt&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;&lt;P&gt;kirt&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; B&lt;/P&gt;&lt;P&gt;Michaell&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A&lt;/P&gt;&lt;P&gt;Benjamin&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; C&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Mich&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;D&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;gr&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;E&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So&amp;nbsp; basically I am trying to group similar records in a column&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;IMG src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" border="0" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Mar 2019 17:20:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/545904#M151094</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-25T17:20:38Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/545912#M151096</link>
      <description>&lt;P&gt;I think you need to explicitly define what you mean by "edit distance among group is not more than 2".&lt;/P&gt;
&lt;P&gt;There are several SAS functions that do spelling distance, Complev, Compged and Spedis for example. But I&amp;nbsp; am not sure that the COMPLEV, which may come closest, means the same for 2 that you intend. Compged and Spedis would be much larger numeric values for the distance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Plus your grouping may not be consistent with the results you say you want as with complev the name Mich is&amp;nbsp;2 from Michel which is 1 from Michael. So shouldn't Mich be in the same group as Michael??&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might want to see about reducing the names to distinct names to remove duplicate comparisons.&lt;/P&gt;
&lt;PRE&gt;data have;
input person_name $;
datalines;
Michael
Michel
kurt
kirt
Michaell
Benjamin
Mich
;

proc sql;
   create table want as
   select a.person_name as namea, b.person_name as nameb
         , complev(a.person_name,b.person_name) as spelldist
   from have as a, have as b
   where  a.person_name &amp;lt; b.person_name
   order by spelldist, a.person_name
   ;
quit;&lt;/PRE&gt;
&lt;P&gt;This is mostly to demonstrate the complev function. You could add "and spelldist le 2" to see just the close matches but with your full data would do 1million * 1million comparisons and will take a little while.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2019 15:46:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/545912#M151096</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-03-26T15:46:30Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546200#M151199</link>
      <description>Hi ,&lt;BR /&gt;&lt;BR /&gt;Thanks for your response,&lt;BR /&gt;&lt;BR /&gt;By distance 2 I mean only two characters substitution/insertion/deletion is&lt;BR /&gt;allowed.&lt;BR /&gt;&lt;BR /&gt;As it works with edit distance function in database.&lt;BR /&gt;&lt;BR /&gt;your solution seems ok just that it is consuming bit of time given the&lt;BR /&gt;million records.&lt;BR /&gt;&lt;BR /&gt;Thanks a lot again!!!!!!</description>
      <pubDate>Tue, 26 Mar 2019 15:26:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546200#M151199</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-26T15:26:26Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546367#M151276</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt; it is consuming bit of time given the&amp;nbsp;million records.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Maybe can you could reduce the size of the Cartesian&amp;nbsp;product by checking the similarity of the lengths or of the first letters.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Mar 2019 03:04:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546367#M151276</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-03-27T03:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546382#M151284</link>
      <description>&lt;P&gt;You want groups such that the distance between any pair of words in&amp;nbsp;the group is no more than 2.&amp;nbsp;&amp;nbsp; Putting aside the computational burden, this criterion will not define a single collection of groups.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The "names"&lt;/P&gt;
&lt;P&gt;AB&lt;/P&gt;
&lt;P&gt;ABC&lt;/P&gt;
&lt;P&gt;ABCD&lt;/P&gt;
&lt;P&gt;ABCDE&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;will produce 2 groups using the 2 insertion/deletion rule, but those groups could be&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;{AB,ABC,ABCD}&amp;nbsp;&amp;nbsp;&amp;nbsp;and {ABCDE}&amp;nbsp;&amp;nbsp; or&lt;/LI&gt;
&lt;LI&gt;{AB,ABC}&amp;nbsp;&amp;nbsp;&amp;nbsp;and {ABCD,ABCDE}&amp;nbsp; or&lt;/LI&gt;
&lt;LI&gt;{AB}&amp;nbsp;&amp;nbsp;&amp;nbsp;and {ABC,ABCD,ABCDE}&amp;nbsp; or&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Do you intend to define mutually exclusive groups?&amp;nbsp;&amp;nbsp; Or all possible groupings that satisfy your criterion?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Mar 2019 06:37:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546382#M151284</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-03-27T06:37:35Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546388#M151286</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I want mutually exclusive group.&lt;BR /&gt;&lt;BR /&gt;Thanks!!!!!</description>
      <pubDate>Wed, 27 Mar 2019 07:15:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546388#M151286</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-27T07:15:26Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546399#M151292</link>
      <description>&lt;P&gt;In that case you'll need to group the groups. &lt;BR /&gt;Search for &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp;'s article to find all paths in a directed graph network. &lt;/P&gt;</description>
      <pubDate>Wed, 27 Mar 2019 21:35:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546399#M151292</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-03-27T21:35:27Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546633#M151375</link>
      <description>&lt;P&gt;I just provided three examples of mutually exclusive groupings.&amp;nbsp; Which example do you want?&amp;nbsp; What is the criterion?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Mar 2019 19:00:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546633#M151375</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-03-27T19:00:38Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546717#M151426</link>
      <description>Hi Mark,&lt;BR /&gt;&lt;BR /&gt;It would be graet if you could provide solution for criterion 2 and 3.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Rohit&lt;BR /&gt;</description>
      <pubDate>Wed, 27 Mar 2019 21:14:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546717#M151426</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-27T21:14:05Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546973#M151519</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/261497"&gt;@Rohit_1990&lt;/a&gt;&amp;nbsp;, how about this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I changed your groups to be represented by numbers, since it is easier to iterate that way.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input Person_name:$20.;
datalines;
Michael
Michel
kurt
kirt
Michaell
Benjamin
Mich
;

data want(keep=Person_name Group_name);
   length Person_name $20 Comp_Name $20 Group_name 8;

   if _N_ = 1 then do;
      declare hash h();
      h.defineKey('Comp_Name');
      h.defineData('Comp_Name', 'Group_name');
      h.defineDone();
      declare hiter hi('h');

      declare hash hh(multidata:'Y');
      hh.defineKey('Group_name');
      hh.defineData('Group_name', 'Comp_Name');
      hh.defineDone(); 

      _Group_Name=0;
   end;

   set have;

   rc=h.find(key:Person_name);

   if rc ne 0 then do;
      rc=hi.first();
      do while (rc=0);

         if complev(Person_name, Comp_Name) le 2 then do;
            rc=hh.find();

            do while (r ne 0);
               dist=complev(Person_name, Comp_Name);
               hh.has_next(result:r);

               if r=0 &amp;amp; dist le 2 then do;
                  h.add(key:Person_name, data:Person_name, data:Group_Name);
                  hh.add();
                  output;return;
               end;

               else if r ne 0 &amp;amp; dist le 2 then rc=hh.find_next();

               else if dist &amp;gt; 2 then leave;
            
            end;

         end;
         
         rc=hi.next();

      end;

      _Group_Name+1;
      Group_Name=_Group_Name;
      h.add(key:Person_name, data:Person_name, data:Group_Name);
      hh.add(key:Group_Name, data:Group_Name, data:Person_name);
   end;

   output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This gives you&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.PNG" style="width: 421px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/28298i42D1E3AC666C0DCB/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture.PNG" alt="Capture.PNG" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Mar 2019 17:44:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/546973#M151519</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-03-28T17:44:10Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547121#M151570</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Thanks for your solution but somehow code is not working.&lt;BR /&gt;rc=h.find(key:Person_name);&lt;BR /&gt;It is getting aborted are above line.&lt;BR /&gt;&lt;BR /&gt;Can you please check.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 29 Mar 2019 08:05:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547121#M151570</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-29T08:05:35Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547126#M151571</link>
      <description>&lt;P&gt;Does this error occur when you run the code on the sample data (at the top of my code) or your own data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If it occurs with your own data, then make sure that your variable is named &lt;STRONG&gt;Person_Name&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 09:26:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547126#M151571</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-03-29T09:26:49Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547137#M151574</link>
      <description>It occurs when I run code on sample data provided in example.</description>
      <pubDate>Fri, 29 Mar 2019 10:29:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547137#M151574</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-29T10:29:20Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547140#M151575</link>
      <description>Can you post the full log please?</description>
      <pubDate>Fri, 29 Mar 2019 10:37:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547140#M151575</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-03-29T10:37:16Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547254#M151611</link>
      <description>Hi,&lt;BR /&gt;Please find the log and code below&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;1 The SAS System&lt;BR /&gt;17:11 Friday, March 29, 2019&lt;BR /&gt;&lt;BR /&gt;1 ;*';*";*/;quit;run;&lt;BR /&gt;2 OPTIONS PAGENO=MIN;&lt;BR /&gt;3 %LET _CLIENTTASKLABEL='Program (2)';&lt;BR /&gt;4 %LET _CLIENTPROCESSFLOWNAME='Process Flow';&lt;BR /&gt;5 %LET _CLIENTPROJECTPATH='';&lt;BR /&gt;6 %LET _CLIENTPROJECTNAME='';&lt;BR /&gt;7 %LET _SASPROGRAMFILE=;&lt;BR /&gt;8&lt;BR /&gt;9 ODS _ALL_ CLOSE;&lt;BR /&gt;10 OPTIONS DEV=ACTIVEX;&lt;BR /&gt;11 GOPTIONS XPIXELS=0 YPIXELS=0;&lt;BR /&gt;12 FILENAME EGSR TEMP;&lt;BR /&gt;13 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR&lt;BR /&gt;14 STYLE=HtmlBlue&lt;BR /&gt;15&lt;BR /&gt;STYLESHEET=(URL="file:///D:/SAS/sashome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")&lt;BR /&gt;16 NOGTITLE&lt;BR /&gt;17 NOGFOOTNOTE&lt;BR /&gt;18 GPATH=&amp;amp;sasworklocation&lt;BR /&gt;19 ENCODING=UTF8&lt;BR /&gt;20 options(rolap="on")&lt;BR /&gt;21 ;&lt;BR /&gt;NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR&lt;BR /&gt;22&lt;BR /&gt;23 GOPTIONS ACCESSIBLE;&lt;BR /&gt;24 data want(keep=person_name group_name);&lt;BR /&gt;25 length person_name $20&lt;BR /&gt;26 comp_name $20 group_name 8;&lt;BR /&gt;27&lt;BR /&gt;28 if n=1 then do;&lt;BR /&gt;29 declare hash h();&lt;BR /&gt;30 h.definekey('comp_name');&lt;BR /&gt;31 h.definedata('comp_name','group_name');&lt;BR /&gt;32 h.defineDone();&lt;BR /&gt;33 declare hiter hi('h');&lt;BR /&gt;34&lt;BR /&gt;35 declare hash hh(multidata:'y');&lt;BR /&gt;36 hh.definekey('group_name');&lt;BR /&gt;37 h.definedata('group_name','comp_name');&lt;BR /&gt;38 hh.defineDone();&lt;BR /&gt;39&lt;BR /&gt;40 _group_name=0;&lt;BR /&gt;41 end;&lt;BR /&gt;42&lt;BR /&gt;43 set have;&lt;BR /&gt;44&lt;BR /&gt;45 rc=h.find(key:person_name);&lt;BR /&gt;46&lt;BR /&gt;47 if rc ne 0 then do;&lt;BR /&gt;48 rc=hi.first();&lt;BR /&gt;49 do while(rc=0);&lt;BR /&gt;50&lt;BR /&gt;51 if complev(person_name,comp_name)le 2 then do;&lt;BR /&gt;52 rc=hh.find();&lt;BR /&gt;53&lt;BR /&gt;54 do while(r ne 0);&lt;BR /&gt;55 dist=complev(person_name,comp_name);&lt;BR /&gt;56 hh.has_next(result:r);&lt;BR /&gt;57&lt;BR /&gt;2 The SAS System&lt;BR /&gt;17:11 Friday, March 29, 2019&lt;BR /&gt;&lt;BR /&gt;58 if r=0 &amp;amp; dist le 2 then do;&lt;BR /&gt;59 h.add(key:person_name, data:person_name, data:group_name);&lt;BR /&gt;60 hh.add();&lt;BR /&gt;61 output;&lt;BR /&gt;62 return;&lt;BR /&gt;63 end;&lt;BR /&gt;64&lt;BR /&gt;65 else if r ne 0 &amp;amp; dist le 2 then&lt;BR /&gt;66 rc=hh.find_next();&lt;BR /&gt;67&lt;BR /&gt;68 else if dist&amp;gt;2 then leave;&lt;BR /&gt;69&lt;BR /&gt;70 end;&lt;BR /&gt;71&lt;BR /&gt;72 end;&lt;BR /&gt;73&lt;BR /&gt;74 rc=hi.next();&lt;BR /&gt;75&lt;BR /&gt;76 end;&lt;BR /&gt;77&lt;BR /&gt;78 _group_name+1;&lt;BR /&gt;79 group_name=_group_name;&lt;BR /&gt;80 h.add(key:person_name, data:person_name, data:group_name);&lt;BR /&gt;81 hh.add(key:group_name, data:group_name, data:person_name);&lt;BR /&gt;82 end;&lt;BR /&gt;83&lt;BR /&gt;84 output;&lt;BR /&gt;85 run;&lt;BR /&gt;&lt;BR /&gt;NOTE: Variable comp_name is uninitialized.&lt;BR /&gt;NOTE: Variable n is uninitialized.&lt;BR /&gt;ERROR: Uninitialized object at line 45 column 4.&lt;BR /&gt;ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION&lt;BR /&gt;phase.&lt;BR /&gt;NOTE: The SAS System stopped processing this step because of errors.&lt;BR /&gt;NOTE: There were 1 observations read from the data set WORK.HAVE.&lt;BR /&gt;WARNING: The data set WORK.WANT may be incomplete. When this step was&lt;BR /&gt;stopped there were 0 observations and 2 variables.&lt;BR /&gt;WARNING: Data set WORK.WANT was not replaced because this step was stopped.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 0.01 seconds&lt;BR /&gt;cpu time 0.02 seconds&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;86&lt;BR /&gt;87 GOPTIONS NOACCESSIBLE;&lt;BR /&gt;88 %LET _CLIENTTASKLABEL=;&lt;BR /&gt;89 %LET _CLIENTPROCESSFLOWNAME=;&lt;BR /&gt;90 %LET _CLIENTPROJECTPATH=;&lt;BR /&gt;91 %LET _CLIENTPROJECTNAME=;&lt;BR /&gt;92 %LET _SASPROGRAMFILE=;&lt;BR /&gt;93&lt;BR /&gt;94 ;*';*";*/;quit;run;&lt;BR /&gt;95 ODS _ALL_ CLOSE;&lt;BR /&gt;96&lt;BR /&gt;97&lt;BR /&gt;98 QUIT; RUN;&lt;BR /&gt;99&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------------------------------&lt;BR /&gt;------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;data have;&lt;BR /&gt;input person_name:$20.;&lt;BR /&gt;datalines;&lt;BR /&gt;michael&lt;BR /&gt;michel&lt;BR /&gt;kurt&lt;BR /&gt;kirt&lt;BR /&gt;michaell&lt;BR /&gt;benjamin&lt;BR /&gt;mich&lt;BR /&gt;;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;data want(keep=person_name group_name);&lt;BR /&gt;length person_name $20&lt;BR /&gt;comp_name $20 group_name 8;&lt;BR /&gt;&lt;BR /&gt;if n=1 then do;&lt;BR /&gt;declare hash h();&lt;BR /&gt;h.definekey('comp_name');&lt;BR /&gt;h.definedata('comp_name','group_name');&lt;BR /&gt;h.defineDone();&lt;BR /&gt;declare hiter hi('h');&lt;BR /&gt;&lt;BR /&gt;declare hash hh(multidata:'y');&lt;BR /&gt;hh.definekey('group_name');&lt;BR /&gt;h.definedata('group_name','comp_name');&lt;BR /&gt;hh.defineDone();&lt;BR /&gt;&lt;BR /&gt;_group_name=0;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;set have;&lt;BR /&gt;&lt;BR /&gt;rc=h.find(key:person_name);&lt;BR /&gt;&lt;BR /&gt;if rc ne 0 then do;&lt;BR /&gt;rc=hi.first();&lt;BR /&gt;do while(rc=0);&lt;BR /&gt;&lt;BR /&gt;if complev(person_name,comp_name)le 2 then do;&lt;BR /&gt;rc=hh.find();&lt;BR /&gt;&lt;BR /&gt;do while(r ne 0);&lt;BR /&gt;dist=complev(person_name,comp_name);&lt;BR /&gt;hh.has_next(result:r);&lt;BR /&gt;&lt;BR /&gt;if r=0 &amp;amp; dist le 2 then do;&lt;BR /&gt;h.add(key:person_name, data:person_name, data:group_name);&lt;BR /&gt;hh.add();&lt;BR /&gt;output;&lt;BR /&gt;return;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;else if r ne 0 &amp;amp; dist le 2 then&lt;BR /&gt;rc=hh.find_next();&lt;BR /&gt;&lt;BR /&gt;else if dist&amp;gt;2 then leave;&lt;BR /&gt;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;rc=hi.next();&lt;BR /&gt;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;_group_name+1;&lt;BR /&gt;group_name=_group_name;&lt;BR /&gt;h.add(key:person_name, data:person_name, data:group_name);&lt;BR /&gt;hh.add(key:group_name, data:group_name, data:person_name);&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;output;&lt;BR /&gt;run;</description>
      <pubDate>Fri, 29 Mar 2019 16:22:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547254#M151611</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-29T16:22:20Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547260#M151614</link>
      <description>&lt;P&gt;In line 28 in your log, you run the loop&amp;nbsp;&lt;STRONG&gt;if n=1 then do;&lt;/STRONG&gt;. In my code it is &lt;STRONG&gt;if _N_=1 then do;&lt;/STRONG&gt;. There may be other errors as well, but it is bound to be that way if you change the code like this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please run my code &lt;EM&gt;exactly&lt;/EM&gt; as it is written with the sample data &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 16:52:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547260#M151614</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-03-29T16:52:16Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547265#M151618</link>
      <description>Hi ,&lt;BR /&gt;&lt;BR /&gt;Thanks I will check it and copy the same .&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Can you explain your code for my understanding as what is the key value in&lt;BR /&gt;hash h.&lt;BR /&gt;&lt;BR /&gt;Or suggest any good PDF to grasp more of hash.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Rohit</description>
      <pubDate>Fri, 29 Mar 2019 17:01:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547265#M151618</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-29T17:01:20Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547268#M151620</link>
      <description>&lt;P&gt;Before I explain: Did it work for you? And does it work for your actual data? &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for marking my response as an accepted solution. However, can you please mark the response with the actual code as the solution?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 17:05:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547268#M151620</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-03-29T17:05:58Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547270#M151622</link>
      <description>I accept this as solution</description>
      <pubDate>Fri, 29 Mar 2019 17:08:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547270#M151622</guid>
      <dc:creator>Rohit_1990</dc:creator>
      <dc:date>2019-03-29T17:08:57Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy logic to locate duplicates in a table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547275#M151625</link>
      <description>&lt;P&gt;Yes, I am aware that you have accepted this as a solution. However, it helps other users of the communities to browse the communities if you accept the reply that contains the actual code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, you do not answer my question: Have you tried my code on your actual data?&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 17:23:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-logic-to-locate-duplicates-in-a-table/m-p/547275#M151625</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-03-29T17:23:24Z</dc:date>
    </item>
  </channel>
</rss>

