<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: k means clustering in SAS in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/k-means-clustering-in-SAS/m-p/512636#M7497</link>
    <description>&lt;P&gt;In regards to dimension reduction for the purpose of visualization, there isn't necessarily a correct or incorrect answer. You have identified two good techniques, but these techniques do something slightly differently. This will mean that your understanding of the plots that they produce need to be different.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Canonical Discriminant Analysis will use the cluster variable and create a projection that is based upon the cluster labels that you have assigned. That this means, is that CDA will try to find the linear combination of inputs that has the highest correlation with the cluster label. You can think of this as the "best" (given the metric used in CDA) projection of the data for the purpose of seeing what linear combination best separates the cluster labels.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Principal Component Analysis will not consider the cluster labels. This could be more useful if you want to see how the clustering looks in a lower dimension without using the cluster information to bias your projection. The projection of the data is not dependent on how you cluster, but is instead the "best" with respect to the variance of the data, so you can see the data, and then see how the cluster labels are distributed across your projected space.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ultimately the dimension reduction methods answer slightly different questions, and what you're trying to with the dimension reduction and plotting should inform which route that you go.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helped!&lt;/P&gt;</description>
    <pubDate>Tue, 13 Nov 2018 16:46:46 GMT</pubDate>
    <dc:creator>RalphAbbey</dc:creator>
    <dc:date>2018-11-13T16:46:46Z</dc:date>
    <item>
      <title>k means clustering in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/k-means-clustering-in-SAS/m-p/468173#M7092</link>
      <description>&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;&lt;FONT size="4"&gt;After I used the k means clustering using proc fastclus in SAS multiple times (K=1 to 5), I found that k=3 the number of cluster that I want. But the question is : if I want to plot them in two dimension plot, if need to use some variable reduction method to reduce the dimension, but which methods do I use? What is the difference between CPA and CDA in this case, someone pls help me!!! (I have attached outdata3 file)&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="4"&gt;&lt;FONT color="#ff0000"&gt;cannonical discriminant analysis&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;proc candisc data=outdata3 out=clustcan ncan=2;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;class cluster;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;var alcevr1 marever1 alcprobs1 deviant1 viol1 dep1 esteem1 schconn1&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;&amp;nbsp; &amp;nbsp; parpres paractv famconct;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;run;&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;&lt;FONT color="#ff0000"&gt;&lt;STRONG&gt;principle component analysis&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;proc princomp data=outdata3 out=clustprin n=2;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;var alcevr1 marever1 alcprobs1 deviant1 viol1 dep1 esteem1 schconn1&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;&amp;nbsp; &amp;nbsp; parpres paractv famconct;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;run;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;proc sgplot data=clustcan;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;scatter y=can2 x=can1/group=cluster;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;run;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;quit;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;proc sgplot data=clustprin;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;scatter y=prin2 x=prin1/group=cluster;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;run;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;quit;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jun 2018 19:54:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/k-means-clustering-in-SAS/m-p/468173#M7092</guid>
      <dc:creator>jeremyyjm</dc:creator>
      <dc:date>2018-06-06T19:54:24Z</dc:date>
    </item>
    <item>
      <title>Re: k means clustering in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/k-means-clustering-in-SAS/m-p/512636#M7497</link>
      <description>&lt;P&gt;In regards to dimension reduction for the purpose of visualization, there isn't necessarily a correct or incorrect answer. You have identified two good techniques, but these techniques do something slightly differently. This will mean that your understanding of the plots that they produce need to be different.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Canonical Discriminant Analysis will use the cluster variable and create a projection that is based upon the cluster labels that you have assigned. That this means, is that CDA will try to find the linear combination of inputs that has the highest correlation with the cluster label. You can think of this as the "best" (given the metric used in CDA) projection of the data for the purpose of seeing what linear combination best separates the cluster labels.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Principal Component Analysis will not consider the cluster labels. This could be more useful if you want to see how the clustering looks in a lower dimension without using the cluster information to bias your projection. The projection of the data is not dependent on how you cluster, but is instead the "best" with respect to the variance of the data, so you can see the data, and then see how the cluster labels are distributed across your projected space.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ultimately the dimension reduction methods answer slightly different questions, and what you're trying to with the dimension reduction and plotting should inform which route that you go.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helped!&lt;/P&gt;</description>
      <pubDate>Tue, 13 Nov 2018 16:46:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/k-means-clustering-in-SAS/m-p/512636#M7497</guid>
      <dc:creator>RalphAbbey</dc:creator>
      <dc:date>2018-11-13T16:46:46Z</dc:date>
    </item>
  </channel>
</rss>

