<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unexpected clusters from PROC CORR data using PROC CLUSTER in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/297319#M15836</link>
    <description>&lt;P&gt;Your assuptions are correct: By default proc cluster "&lt;SPAN&gt;treats each column sort of as a position in an 'n space' dimension&lt;/SPAN&gt;".&lt;/P&gt;
&lt;P&gt;And yes, you have to use type=distance to change this behavior.&lt;/P&gt;
&lt;P&gt;The trick is, that when you use &amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;id LeftName;&lt;/STRONG&gt; only the rows in the distance matrix are identified.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The column names are ignored! In the distance matrix the columns must be in the same order as the rows!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Code solution:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;proc sort data=lib.Corr1;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; by leftName rightName;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;BR /&gt;proc transpose data=lib.Corr1&lt;BR /&gt; out=lib.Corr1T;&lt;BR /&gt; by leftName;&lt;BR /&gt; id rightName;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;/*Now all the 0-s are in the diagonal of the distance matrix*/&lt;BR /&gt;proc cluster data=lib.Corr1T(type=distance)&lt;BR /&gt;outtree=lib.ClusterTree&lt;BR /&gt;method=average nosquare;&lt;BR /&gt;id LeftName;&lt;BR /&gt;run;&lt;/P&gt;</description>
    <pubDate>Fri, 09 Sep 2016 07:47:00 GMT</pubDate>
    <dc:creator>gergely_batho</dc:creator>
    <dc:date>2016-09-09T07:47:00Z</dc:date>
    <item>
      <title>Unexpected clusters from PROC CORR data using PROC CLUSTER</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/296303#M15796</link>
      <description>&lt;P&gt;I would like to discover clusters of simple line plots. I ran CORR on the plots and subtracted the correlations from 1 to get "distances" between each plot.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was&amp;nbsp;surprised to see that CLUSTER did not always provide low level clusters of the closest plots with any of the methods that I tried. I expect that this is because CLUSTER treats each column sort of as a position in an 'n space' dimension. i.e. it does not rely on the distance calculated by CORR between 2 plots to determine the distance to use and doesn't know that column names match id variable values.&amp;nbsp; I tried Type=DISTANCE as well with no success, though I can't claim to understand how distance is treated differently from coordinates.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The X axis range for the plots varies so the overlap between plots is inconsistent which may be what allows 2 highly correlated plots to have more variability in correlations with less related plots.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was hoping to find small clusters of the most correlated plots that then comprise larger clusters, and so on. Is there a way to do that? Or do I need to code it myself using the agglomerate paradigm? Or am I doing something dumb?&lt;/P&gt;&lt;P&gt;I'm no expert at clustering so I wouldn't be&amp;nbsp;surprised to find I have a conceptual issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note CORR reports VADAX and MVCAX are the 2nd most correlated plot pair, but they do not comprise a low level cluster.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FWIW SAS 3.5 University Edition&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks, Duane&lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2016 13:00:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/296303#M15796</guid>
      <dc:creator>DuaneTiemann</dc:creator>
      <dc:date>2016-09-03T13:00:45Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected clusters from PROC CORR data using PROC CLUSTER</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/297319#M15836</link>
      <description>&lt;P&gt;Your assuptions are correct: By default proc cluster "&lt;SPAN&gt;treats each column sort of as a position in an 'n space' dimension&lt;/SPAN&gt;".&lt;/P&gt;
&lt;P&gt;And yes, you have to use type=distance to change this behavior.&lt;/P&gt;
&lt;P&gt;The trick is, that when you use &amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;id LeftName;&lt;/STRONG&gt; only the rows in the distance matrix are identified.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The column names are ignored! In the distance matrix the columns must be in the same order as the rows!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Code solution:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;proc sort data=lib.Corr1;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; by leftName rightName;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;BR /&gt;proc transpose data=lib.Corr1&lt;BR /&gt; out=lib.Corr1T;&lt;BR /&gt; by leftName;&lt;BR /&gt; id rightName;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;/*Now all the 0-s are in the diagonal of the distance matrix*/&lt;BR /&gt;proc cluster data=lib.Corr1T(type=distance)&lt;BR /&gt;outtree=lib.ClusterTree&lt;BR /&gt;method=average nosquare;&lt;BR /&gt;id LeftName;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2016 07:47:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/297319#M15836</guid>
      <dc:creator>gergely_batho</dc:creator>
      <dc:date>2016-09-09T07:47:00Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected clusters from PROC CORR data using PROC CLUSTER</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/297457#M15837</link>
      <description>&lt;P&gt;Thanks a lot.&amp;nbsp; That's very helpful.&amp;nbsp; Duane&lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2016 16:55:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Unexpected-clusters-from-PROC-CORR-data-using-PROC-CLUSTER/m-p/297457#M15837</guid>
      <dc:creator>DuaneTiemann</dc:creator>
      <dc:date>2016-09-09T16:55:16Z</dc:date>
    </item>
  </channel>
</rss>

