<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Create clusters from pairs in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205908#M38283</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;I managed to do the clustering with the SubGraphs macro. I divided the data into sections of about 1 million records and iterated over these datasets. I used a script found here:&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;&lt;A href="http://snipplr.com/view/6212/" title="http://snipplr.com/view/6212/"&gt;http://snipplr.com/view/6212/&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The hash factor was set to 17. I do not know if that is optimal but the script was fast so that was probably a good level. &lt;/P&gt;&lt;P&gt;Many thanks for all the help I got on this issue! &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 20 Mar 2015 16:08:54 GMT</pubDate>
    <dc:creator>peter_sjogarde</dc:creator>
    <dc:date>2015-03-20T16:08:54Z</dc:date>
    <item>
      <title>Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205896#M38271</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have a large dataset consisting of pairs&lt;SPAN style="font-size: 13.3333330154419px;"&gt; (about 130 miljon rows)&lt;/SPAN&gt;. I would like to create clusters from all observations that are connected.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Example&lt;/P&gt;&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&lt;/P&gt;&lt;P&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Two groups should be created from the example data; group 1: 1,2,3,4,5 and group 2: 6,7.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any ideas on an approach for this?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for any help!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 16:07:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205896#M38271</guid>
      <dc:creator>peter_sjogarde</dc:creator>
      <dc:date>2015-03-19T16:07:42Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205897#M38272</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;What is the rule that makes values of 1,2,3,4,5 go to group 1 and 6 and 7 go to group 2? "Connected" is very much dependent on your knowledge.&lt;/P&gt;&lt;P&gt;Are there two different variables involved (not quite obvious from your example) or more than 2?&lt;/P&gt;&lt;P&gt;Are you looking to add a variable that says what "group" a record belongs in?&lt;/P&gt;&lt;P&gt;Suppose a few thousand records below your example we have 7 2. What group would this record be in?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 16:42:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205897#M38272</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2015-03-19T16:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205898#M38273</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you for your question. I will try to make my self mor clear. The numbers are id:s and the connections are binary. All id:s that are connected, either directly or indirectly, should be clustered in the same group. &lt;SPAN style="font-size: 13.3333330154419px;"&gt;6 and 7 do not have any connections with 1-5, neither directly or indirectly so they do not cluster with that group. If we add a pair of 7 2 this will connect group 1 and 2 in the example and create one group.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 16:52:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205898#M38273</guid>
      <dc:creator>peter_sjogarde</dc:creator>
      <dc:date>2015-03-19T16:52:20Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205899#M38274</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Just a few questions to be able to think about this ... from most important to least important ...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How many different identifiers do you have (ballpark figure)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It looks like your individual identifiers (1 through 7 in the example) will be numeric.&amp;nbsp; Is that correct?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do you have any preconceptions about the cluster sizes?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you end up with just 1 cluster, would that be acceptable or would you want to apply a different set of rules?&lt;/P&gt;&lt;P&gt;Ballardw, if there was a record with 7 2, it would belong in both clusters.&amp;nbsp; That means that the two clusters would get collapsed into one, and there would only be one cluster.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 16:55:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205899#M38274</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2015-03-19T16:55:40Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205900#M38275</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There are about 1.8 million unique identifiers and they are numeric. The cluster sizes will probably be from 2 to about 2000. If only connected identifiers are clustered I will not end up in one cluster, since I know that most observations are not linked. There will probably be something about 5-10 observations in each cluster in average, but I expect a skewed distribution so most clusters will only consist of a few observations.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 17:07:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205900#M38275</guid>
      <dc:creator>peter_sjogarde</dc:creator>
      <dc:date>2015-03-19T17:07:46Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205901#M38276</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Astounding&lt;/P&gt;&lt;P&gt;I wanted to make sure that the data did not rely on contiguous order. The example data showed a potential for that as an unstated rule.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 17:26:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205901#M38276</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2015-03-19T17:26:15Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205902#M38277</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have restructured the indata so that the observations is divided into blocks, so the clustering can be done within each block which probably will faciliate the task. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;block&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id2&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;b&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;b&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ballardw - I was trying to make the data in countiguous order earlier on but I could not figure out a way to do that. It would probably be a good sollution. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 18:11:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205902#M38277</guid>
      <dc:creator>peter_sjogarde</dc:creator>
      <dc:date>2015-03-19T18:11:52Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205903#M38278</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Does order matter, can you switch around ID1/ID2 without any repercussions?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If so it might be worth sorting so that ID1&amp;lt;ID2.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you have SAS/OR look into PROC BOM.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Otherwise I think the macro here will work for you, courtesy of &lt;A __default_attr="2746" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="1268" __jive_macro_name="document" class="jive_macro jive_macro_document" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 18:45:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205903#M38278</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-03-19T18:45:57Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205904#M38279</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Good news / bad news ... I have a plan that will get you most of the way there.&amp;nbsp; However, it involves too much in the way of hashing skills and someone else would have to put the program together.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;By "most of the way there", here's the output I envision for your sample data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ID&amp;nbsp;&amp;nbsp; ClusterNum&lt;/P&gt;&lt;P&gt;------------------------&lt;/P&gt;&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Each ID would end up assigned to a cluster number.&amp;nbsp; And, as illustrated, the possibility exists that some cluster numbers would be unused (due to collapsing of clusters into one another as the processing proceeds).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It would remain to use this output to actually assign each pair to a cluster, but that is relatively easy.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If that's an acceptable result, I can sketch out what the program needs to do.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 21:26:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205904#M38279</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2015-03-19T21:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205905#M38280</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The OPTNET procedure in SAS/OR will find connected components for both directed and undirected graphs. Look at the CONCOMP statement.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you don't have a licence for SAS/OR and your graph is undirected, then use my macro, as mentioned by &lt;A __default_attr="255172" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt;, and set the hash size factor fairly high (15-18, I would guess).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately, neither OPTNET or the SubGraphs macro supports BY processing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Mar 2015 22:07:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205905#M38280</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2015-03-19T22:07:33Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205906#M38281</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;Thank you for all answers helping me move forward in this process. I tried iterating over the data creating a dataset for each block and running the Subgraphs macro for each dataset. This was (of course) to slow, but it seemed to work. I will take a look at the script to see if I could add a BY-option to the SubGraphs code (any help in this matter would be greatly appreciated). &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;Astounding – That is an acceptable result. Unfortunately I am new to hash, so I would probably not be able to putting such a script together. I realize that I need to study the hash object. &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 20 Mar 2015 08:08:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205906#M38281</guid>
      <dc:creator>peter_sjogarde</dc:creator>
      <dc:date>2015-03-20T08:08:33Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205907#M38282</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Here is the code I wrote to find out who belong to the same household. But Your data is too big. &lt;A __default_attr="5582" __jive_macro_name="document" class="jive_macro jive_macro_document" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I don't know if you have enough memory to handle this :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;

data have;
infile cards ;
input from $&amp;nbsp; to $ ;
cards;
1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2
1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3
4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5
5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2
9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4
6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7
8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7
;
run;
data full;
&amp;nbsp; set have end=last;
&amp;nbsp; if _n_ eq 1 then do;
&amp;nbsp;&amp;nbsp; declare hash h();
&amp;nbsp;&amp;nbsp;&amp;nbsp; h.definekey('node');
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.definedata('node');
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.definedone();
&amp;nbsp; end;
&amp;nbsp; output;
&amp;nbsp; node=from; h.replace();
&amp;nbsp; from=to; to=node; 
&amp;nbsp; output;
&amp;nbsp; node=from; h.replace();
&amp;nbsp; if last then h.output(dataset:'node');
&amp;nbsp; drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
&amp;nbsp;&amp;nbsp; from=last;rx=from_to.find();
&amp;nbsp;&amp;nbsp; do while(rx=0);
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key=to;ry=_ha.check();
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ry ne 0 then do;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; node=to;output;rr=no.remove(key:node);
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key=to;_ha.add();
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; count+1;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; last=to;ha.add();
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rx=from_to.find_next();
&amp;nbsp;&amp;nbsp; end;
&amp;nbsp;&amp;nbsp; rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;


&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Xia Keshan&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Message was edited by: xia keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 20 Mar 2015 10:39:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205907#M38282</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2015-03-20T10:39:00Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205908#M38283</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;I managed to do the clustering with the SubGraphs macro. I divided the data into sections of about 1 million records and iterated over these datasets. I used a script found here:&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;&lt;A href="http://snipplr.com/view/6212/" title="http://snipplr.com/view/6212/"&gt;http://snipplr.com/view/6212/&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The hash factor was set to 17. I do not know if that is optimal but the script was fast so that was probably a good level. &lt;/P&gt;&lt;P&gt;Many thanks for all the help I got on this issue! &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 20 Mar 2015 16:08:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205908#M38283</guid>
      <dc:creator>peter_sjogarde</dc:creator>
      <dc:date>2015-03-20T16:08:54Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205909#M38284</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;That code is to split a table into lots of sub-tables according to a group variable . I can do it better than him&amp;nbsp; via Hash Table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;The hash factor was set to 17. I do not know if that is optimal but the script was fast so that was probably a good level."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;I think the following Hash should be set the maximize value 20 ,on account of your big table .&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if 0 then set full;&lt;/P&gt;&lt;P&gt;declare hash from_to(dataset:'full',hashexp:&lt;STRONG&gt;20&lt;/STRONG&gt;,multidata:'y');&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 21 Mar 2015 06:24:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205909#M38284</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2015-03-21T06:24:22Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205910#M38285</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;PG,&lt;/P&gt;&lt;P&gt;"&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Unfortunately, neither OPTNET or the SubGraphs macro supports BY processing."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;I can do that .&amp;nbsp; &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Xia Keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 21 Mar 2015 06:56:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205910#M38285</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2015-03-21T06:56:21Z</dc:date>
    </item>
    <item>
      <title>Re: Create clusters from pairs</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205911#M38286</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Although the question is deemed answered, I want to put in 2 cents into the discussion.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;I do not have access to SAS/OR so I cannot evaluate the chosen solution.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;There are 2 offered solutions: 1) by xia keshan, and 2) SubGraphsMacro.sas (by PGStats)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;In the following I propose a different solution.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;First the input dataset t_a (here I will characterize it as 100000 (150000)):&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/***********************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/**** input dataset ****/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/***********************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;data t_a;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; do _N_ = 1 to 100000;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a1 =&amp;nbsp; int(150000*ranuni(3));&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a2 =&amp;nbsp; int(150000*ranuni(5));&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;run; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Now some benchmarks for the 3 solutions: 1) xia 2) current 3) SubGraphsMacro&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xia&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; current&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SubGraphsMacro&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 100000&amp;nbsp;&amp;nbsp; (150000) -&amp;gt;&amp;nbsp; 54.00 sec ( 2.00 sec)&amp;nbsp;&amp;nbsp; (7:00 min)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 200000&amp;nbsp;&amp;nbsp; (300000) -&amp;gt;&amp;nbsp;&amp;nbsp; 1:50 min ( 4.06 sec)&amp;nbsp; (25:57 min)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 400000&amp;nbsp;&amp;nbsp; (600000) -&amp;gt;&amp;nbsp;&amp;nbsp; 3:35 min ( 9.00 sec)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; 1000000&amp;nbsp; (1500000) -&amp;gt;&amp;nbsp;&amp;nbsp; 9:15 min (13.50 sec)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; 2000000&amp;nbsp; (3000000) -&amp;gt;&amp;nbsp; 19:19 min (34:00 sec) &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; 4000000&amp;nbsp; (6000000) -&amp;gt;&amp;nbsp; 37:45 min ( 1:03 min)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; 8000000 (12000000) -&amp;gt; 106:58 min ( 2:25 min)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;16000000 (24000000) -&amp;gt; (not done) (memory crash)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Observations:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; 1) SubGraphsMacro is by far the slowest, about 10 times slower that xia's solution.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; 2) The current solution is about 30 times faster than xia's solution.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; 3) If enough RAM resources are available the current solution can find the clusters for 130M record dataset in less than 1 hour.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;The main ingredient of the proposed solution is the "symmetrization" of the dataset:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/*****************************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/*** t_b = symmetrized t_a ***/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/*****************************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;data t_b(keep=a1 a2);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; set t_a(rename=(a1=b1 a2=b2));&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; if b1=b2 then do; a1=b1; a2=b2; output; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else do; a1=b1; a2=b2; output;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a1=b2; a2=b1; output; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/************************************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/***** proposed hash solution 1 *****/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;/************************************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;data _null_;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; length yMbr Mbr SetNo a1 a2 SET_NO SetNo 8.;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; if _N_=1 then do;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash ha(dataset:'t_b',multidata:'Y');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ha.definekey ('a1');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ha.definedata('a1','a2');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hiter aIter('ha');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ha.definedone();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(a1,a2);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash hx(multidata:'N');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hx.definekey ('xMbr');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hx.definedata('xMbr');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hiter xIter('hx');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hx.definedone();&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash hy(multidata:'N');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hy.definekey ('yMbr');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hy.definedata('yMbr');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hiter yIter('hy');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hy.definedone();&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash hz(multidata:'N');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hz.definekey ('Mbr');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hz.definedata('Mbr','SetNo');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hz.definedone();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; do until (aDone);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set t_b end=aDone;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xMbr=a1; hx.ref();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xMbr=a2; hx.ref();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; /***************************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; /*** start of clustering ***/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; /***************************/&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; Set_No=0;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; aSum=0;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; k1=xIter.first();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; do while (k1=0);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aSum+1;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a1=xMbr; Mbr=xMbr;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k2=ha.find();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; j2=hz.find();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; z1=hy.num_items;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (k2=0) and (not(j2=0)) and (z1=0) then do;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Set_No+1; zChange=1;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; yMbr=a1; hy.ref();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; yMbr=a2; hy.ref();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do until (zChange=0);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; zChange=0;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k3=yIter.first();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while(k3=0);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a1=yMbr;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k4=ha.find();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (k4=0) then do; d1=ha.remove(key:a1); end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while (k4=0);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; yMbr=a2; zAdd=hy.add();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; zChange+(zAdd=0);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k4=ha.find_next();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (k4=0) then do; d1=ha.remove(key:a1); end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k3=yIter.next();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k4=yIter.first();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while (k4=0);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Mbr=yMbr; SETNO=SET_NO; hz.ref();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k4=yIter.next();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; z1=hy.num_items;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hy.clear();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; z1=hy.num_items;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k1=xIter.next();&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; hz.output(dataset:'t_c');&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Apr 2015 19:16:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-clusters-from-pairs/m-p/205911#M38286</guid>
      <dc:creator>billfish</dc:creator>
      <dc:date>2015-04-09T19:16:41Z</dc:date>
    </item>
  </channel>
</rss>

