<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ID or Count each cluster /set of fuzzy duplicates in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/535662#M6522</link>
    <description>&lt;P&gt;Thank you so much!!&lt;/P&gt;&lt;P&gt;That's a great solution. I have been trying to understand it but I am struggling with some aspects.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I know what it does but don't understand why.&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;node&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; h&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;replace&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  &lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;to&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; to&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;node&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  output&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  node&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; h&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;replace&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  &lt;SPAN class="token keyword"&gt;if&lt;/SPAN&gt; last &lt;SPAN class="token keyword"&gt;then&lt;/SPAN&gt; h&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;output&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;dataset:&lt;SPAN class="token string"&gt;'node'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then the making of the want is a mystery. What is going on there? It definitely worked, even with more my more complex and much larger dataset.&lt;/P&gt;</description>
    <pubDate>Thu, 14 Feb 2019 17:55:53 GMT</pubDate>
    <dc:creator>catnipper</dc:creator>
    <dc:date>2019-02-14T17:55:53Z</dc:date>
    <item>
      <title>ID or Count each cluster /set of fuzzy duplicates</title>
      <link>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/534627#M6355</link>
      <description>&lt;P&gt;I have several duplicates that don't look like duplicates to SAS but I know they are. (They were determined by fuzzy matching.) The only way to know they are associated is by looking at their clustering. A is related to B, and B to E and E to A.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to ID each cluster, so I can run code &lt;U&gt;by ClusterID&lt;/U&gt;. The next step will be to remove some records from each cluster based on additional requirements.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is what I have:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;data have;
input left $ right $;
cards;
A	B
B	E
C	D
D	C
E	A
F	G 
;
run;
proc print data=have;run; &lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to create any unique ID per row, a count is fine but it doesnt need to be consecutive.&lt;/P&gt;&lt;P&gt;I need the data to look like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Want:
      left     right      ClusterID
	A	  B		 1
	B	  E		 1
	C	  D		 2
	D	  C		 2
	E	  A		 1
	F	  G 	         3&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Any ideas?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2019 21:33:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/534627#M6355</guid>
      <dc:creator>catnipper</dc:creator>
      <dc:date>2019-02-11T21:33:52Z</dc:date>
    </item>
    <item>
      <title>Re: ID or Count each cluster /set of fuzzy duplicates</title>
      <link>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/534783#M6378</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have(rename=(left=from right=to));
infile cards expandtabs truncover;
input left $ right $;
cards;
A	B
B	E
C	D
D	C
E	A
F	G 
;
run;



data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 12 Feb 2019 12:28:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/534783#M6378</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-02-12T12:28:33Z</dc:date>
    </item>
    <item>
      <title>Re: ID or Count each cluster /set of fuzzy duplicates</title>
      <link>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/535662#M6522</link>
      <description>&lt;P&gt;Thank you so much!!&lt;/P&gt;&lt;P&gt;That's a great solution. I have been trying to understand it but I am struggling with some aspects.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I know what it does but don't understand why.&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;node&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; h&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;replace&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  &lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;to&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; to&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;node&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  output&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  node&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; h&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token keyword"&gt;replace&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
  &lt;SPAN class="token keyword"&gt;if&lt;/SPAN&gt; last &lt;SPAN class="token keyword"&gt;then&lt;/SPAN&gt; h&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;output&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;dataset:&lt;SPAN class="token string"&gt;'node'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then the making of the want is a mystery. What is going on there? It definitely worked, even with more my more complex and much larger dataset.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Feb 2019 17:55:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/535662#M6522</guid>
      <dc:creator>catnipper</dc:creator>
      <dc:date>2019-02-14T17:55:53Z</dc:date>
    </item>
    <item>
      <title>Re: ID or Count each cluster /set of fuzzy duplicates</title>
      <link>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/535828#M6542</link>
      <description>&lt;P&gt;This code create full path which is used to search a TREE. For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From To&lt;/P&gt;
&lt;P&gt;A&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; B&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;would exchange the position and get :&lt;/P&gt;
&lt;P&gt;From To&lt;/P&gt;
&lt;P&gt;A&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; B&lt;/P&gt;
&lt;P&gt;B&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; A&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And the code generated WANT table is a long story to explain.&lt;/P&gt;
&lt;P&gt;I have no time to explain the details to you.&lt;/P&gt;
&lt;P&gt;If you are familiar with Hash Table, I think you could understand the mystery as long as you read it with a lot of time .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good Luck.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Feb 2019 11:32:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/ID-or-Count-each-cluster-set-of-fuzzy-duplicates/m-p/535828#M6542</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-02-15T11:32:13Z</dc:date>
    </item>
  </channel>
</rss>

