<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cluster into two even groups in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233571#M54833</link>
    <description>PROC SURVEYSELECT is a better method.</description>
    <pubDate>Fri, 06 Nov 2015 21:29:20 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2015-11-06T21:29:20Z</dc:date>
    <item>
      <title>Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233564#M54832</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I need to cluster some points into two groups OF THE SAME SIZE, or +- 1 of course if the input data set has an odd number of observations. &amp;nbsp;I have no idea how to do this.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's take this example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data test;&lt;BR /&gt;input name $ height weight sex $ Hair $;&lt;BR /&gt;cards;&lt;BR /&gt;Bob 76 175 M Brown&lt;BR /&gt;Joe 45 160 M Brown&lt;BR /&gt;Jim 70 200 M Black&lt;BR /&gt;Paul 72 160 M Brown&lt;BR /&gt;Mel 56 130 F Blond&lt;BR /&gt;Jill 60 125 F Brown&lt;BR /&gt;;&lt;/P&gt;
&lt;P&gt;Proc fastclus data=test out=test_out maxclusters=2 noprint;&lt;BR /&gt; var height weight;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc print data = test_out;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Output&lt;/P&gt;
&lt;DIV class="branch"&gt;
&lt;DIV&gt;
&lt;DIV align="center"&gt;
&lt;TABLE class="table" summary="Procedure Print: Data Set WORK.TEST_OUT" frame="box" rules="all" cellspacing="0" cellpadding="5"&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="r header" scope="col"&gt;Obs&lt;/TH&gt;
&lt;TH class="l header" scope="col"&gt;name&lt;/TH&gt;
&lt;TH class="r header" scope="col"&gt;height&lt;/TH&gt;
&lt;TH class="r header" scope="col"&gt;weight&lt;/TH&gt;
&lt;TH class="l header" scope="col"&gt;sex&lt;/TH&gt;
&lt;TH class="l header" scope="col"&gt;Hair&lt;/TH&gt;
&lt;TH class="r header" scope="col"&gt;CLUSTER&lt;/TH&gt;
&lt;TH class="r header" scope="col"&gt;DISTANCE&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;1&lt;/TH&gt;
&lt;TD class="l data"&gt;Bob&lt;/TD&gt;
&lt;TD class="r data"&gt;76&lt;/TD&gt;
&lt;TD class="r data"&gt;175&lt;/TD&gt;
&lt;TD class="l data"&gt;M&lt;/TD&gt;
&lt;TD class="l data"&gt;Brown&lt;/TD&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;12.8550&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;2&lt;/TH&gt;
&lt;TD class="l data"&gt;Joe&lt;/TD&gt;
&lt;TD class="r data"&gt;45&lt;/TD&gt;
&lt;TD class="r data"&gt;160&lt;/TD&gt;
&lt;TD class="l data"&gt;M&lt;/TD&gt;
&lt;TD class="l data"&gt;Brown&lt;/TD&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;20.9672&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;3&lt;/TH&gt;
&lt;TD class="l data"&gt;Jim&lt;/TD&gt;
&lt;TD class="r data"&gt;70&lt;/TD&gt;
&lt;TD class="r data"&gt;200&lt;/TD&gt;
&lt;TD class="l data"&gt;M&lt;/TD&gt;
&lt;TD class="l data"&gt;Black&lt;/TD&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;12.8550&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;4&lt;/TH&gt;
&lt;TD class="l data"&gt;Paul&lt;/TD&gt;
&lt;TD class="r data"&gt;72&lt;/TD&gt;
&lt;TD class="r data"&gt;160&lt;/TD&gt;
&lt;TD class="l data"&gt;M&lt;/TD&gt;
&lt;TD class="l data"&gt;Brown&lt;/TD&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;21.2867&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;5&lt;/TH&gt;
&lt;TD class="l data"&gt;Mel&lt;/TD&gt;
&lt;TD class="r data"&gt;56&lt;/TD&gt;
&lt;TD class="r data"&gt;130&lt;/TD&gt;
&lt;TD class="l data"&gt;F&lt;/TD&gt;
&lt;TD class="l data"&gt;Blond&lt;/TD&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;13.9329&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;6&lt;/TH&gt;
&lt;TD class="l data"&gt;Jill&lt;/TD&gt;
&lt;TD class="r data"&gt;60&lt;/TD&gt;
&lt;TD class="r data"&gt;125&lt;/TD&gt;
&lt;TD class="l data"&gt;F&lt;/TD&gt;
&lt;TD class="l data"&gt;Brown&lt;/TD&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;18.8315&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So I want to force 3 observations into cluster 1, and 3 into cluster 2. &amp;nbsp;I am really not picky about the clustering technique/method or even the PROC. &amp;nbsp;It just needs to split data points evenly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks a bunch!&lt;/P&gt;
&lt;P&gt;Paul&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 21:11:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233564#M54832</guid>
      <dc:creator>P_2</dc:creator>
      <dc:date>2015-11-06T21:11:39Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233571#M54833</link>
      <description>PROC SURVEYSELECT is a better method.</description>
      <pubDate>Fri, 06 Nov 2015 21:29:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233571#M54833</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-11-06T21:29:20Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233580#M54834</link>
      <description>&lt;P&gt;Do you want random assignment into two groups, in which case SURVEYSELECT would be your best bet as &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza﻿&lt;/a&gt;&amp;nbsp;pointed out,&amp;nbsp;or do you want to form two homogeneous groups, in which case some clustering procedure would be better?&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 21:56:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233580#M54834</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2015-11-06T21:56:32Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233584#M54835</link>
      <description>&lt;P&gt;This will add a group variable assigning alternating records to one or the other group with values of 1 or 0:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data paired;
   set test;
   group = mod(_n_, 2);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If the are no criteria for grouping or differentiation. This is extensible to other size groups as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 22:30:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233584#M54835</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2015-11-06T22:30:01Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233831#M54848</link>
      <description>&lt;P&gt;Thank you everyone for the responses. &amp;nbsp;I should have been a bit more clear I what i am trying to do. &amp;nbsp;I am not simply trying to split the data into two groups, I am trying to create two groups, same number of observations, which minimizes the optimal path for the sum of the two groups. &amp;nbsp;So instead of height and weight, imagine latitude and longitude. &amp;nbsp;Later on in my program I am going to construct the optimum route calculator. &amp;nbsp;I want to split the points in such a way that when I perform a 'travelling salemen' analysis on the two groups separately, it minimizes the total distance. &amp;nbsp;Clustering seems like a natural way to split the groups, I just can't figure out how to force the new groups to be the same size. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2015 13:35:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233831#M54848</guid>
      <dc:creator>P_2</dc:creator>
      <dc:date>2015-11-09T13:35:06Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233833#M54849</link>
      <description>&lt;P&gt;You might take a look at &lt;A href="http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_hpsplit_syntax01.htm" target="_self"&gt;the HPSPLIT procedure&lt;/A&gt; (part of SAS/STAT 14.1, which is in SAS 9.4m3). &amp;nbsp;It has several options, such as INTERVALBINS and MAXBRANCH, that allow you to apply limits to how the algorithm splits the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It also creates a pretty cool decision-tree style of output -- in addition to the regular tabular output. &amp;nbsp;You &lt;A href="http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_hpsplit_examples02.htm" target="_self"&gt;might even be able to adapt an example&lt;/A&gt; to your particular problem.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Even though the procedure is named "HPSPLIT", you don't need access to a high-performance environment to use it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2015 13:54:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233833#M54849</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2015-11-09T13:54:16Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233846#M54850</link>
      <description>&lt;P&gt;Thanks for the suggestion. &amp;nbsp;Unfortunately, I only have SAS 9.3 at work. &amp;nbsp;Do you know of any comparable to these techniques that would run in 9.3?&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2015 15:17:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/233846#M54850</guid>
      <dc:creator>P_2</dc:creator>
      <dc:date>2015-11-09T15:17:33Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster into two even groups</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/234223#M54900</link>
      <description>&lt;P&gt;Given your explanation of what you are trying to solve &amp;nbsp;(the TSP in two regions), you might want to rethink your requirement that you split the locations into two clusters with equal number of points.&amp;nbsp;&amp;nbsp; The following example shows two clusters, one with&amp;nbsp;4 points, the other with 6. It seems to me that if you are trying to minimize the total distance, you'd do much better with two unequally spaced clusters and one route that connects to two clusters.If you try to put one of the points in the lower group into the top cluster, I think you'll increase the total distance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
input x y @@;
datalines;
-0.1 2   0 2   0.1 2  0 2.1
-0.1 -3  0 -3  0.1 -3  -0.1 -3.1  0 -3.1  0.1 -3.1  
;
proc sgplot data=A;
scatter x=x y=y;
xaxis grid min=-2 max=2;
yaxis grid;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 11 Nov 2015 16:22:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Cluster-into-two-even-groups/m-p/234223#M54900</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2015-11-11T16:22:01Z</dc:date>
    </item>
  </channel>
</rss>

