<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Optimal number of clusters in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27223#M148</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; &lt;SPAN lang="EN-GB" style="font-family: Arial; font-size: 10pt;"&gt;I think there are no strict rules for optimal number of clusters and as in all cluster analysis – there is a lot of room for variations and interpretation. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Maybe someone can give more specific criteria, but the ones I would consider:&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-GB" style="font-family: Arial; font-size: 10pt;"&gt;* Use of &lt;/SPAN&gt;graphical analysis to understand if your clusters are well separated, maybe some are very close and can be joined. I think also a tree (PROC TREE) is a very useful tool. There you can see how many groups (more separated tree branches) you have.&lt;/P&gt;&lt;P&gt;* Most likely you wouldn’t like to have clusters with just 1 or few observations.&lt;/P&gt;&lt;P&gt;* In some cases your data or task can give hint about number of clusters (e.g. maybe you want to separate items with high, low and middle level of something).&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 11 Oct 2011 14:05:59 GMT</pubDate>
    <dc:creator>ieva</dc:creator>
    <dc:date>2011-10-11T14:05:59Z</dc:date>
    <item>
      <title>Optimal number of clusters</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27222#M147</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can we determine the number of Optimal cluster in cluster analysis?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Nikhil&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 11 Oct 2011 07:44:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27222#M147</guid>
      <dc:creator>Nikhil</dc:creator>
      <dc:date>2011-10-11T07:44:57Z</dc:date>
    </item>
    <item>
      <title>Optimal number of clusters</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27223#M148</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; &lt;SPAN lang="EN-GB" style="font-family: Arial; font-size: 10pt;"&gt;I think there are no strict rules for optimal number of clusters and as in all cluster analysis – there is a lot of room for variations and interpretation. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Maybe someone can give more specific criteria, but the ones I would consider:&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-GB" style="font-family: Arial; font-size: 10pt;"&gt;* Use of &lt;/SPAN&gt;graphical analysis to understand if your clusters are well separated, maybe some are very close and can be joined. I think also a tree (PROC TREE) is a very useful tool. There you can see how many groups (more separated tree branches) you have.&lt;/P&gt;&lt;P&gt;* Most likely you wouldn’t like to have clusters with just 1 or few observations.&lt;/P&gt;&lt;P&gt;* In some cases your data or task can give hint about number of clusters (e.g. maybe you want to separate items with high, low and middle level of something).&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 11 Oct 2011 14:05:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27223#M148</guid>
      <dc:creator>ieva</dc:creator>
      <dc:date>2011-10-11T14:05:59Z</dc:date>
    </item>
    <item>
      <title>Optimal number of clusters</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27224#M149</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN class="comment-body"&gt;For hierarchical clustering try the Sarle's Cubic Clustering Criterion in PROC CLUSTER: &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="comment-body"&gt;plot _CCC_ versus the number of clusters and look for peaks where _ccc_ &amp;gt; 3 or look for &lt;STRONG&gt;local peaks of pseudo-F &lt;/STRONG&gt;statistic (_PSF_) &lt;STRONG&gt;combined with a small value of the pseudo-t^2&lt;/STRONG&gt; statistic (_PST2_) and a &lt;STRONG&gt;larger pseudo t^2 for the next cluster &lt;/STRONG&gt;fusion &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="comment-body"&gt;(see &lt;/SPAN&gt;&lt;EM&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introclus_sect010.htm"&gt;http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introclus_sect010.htm&lt;/A&gt;&lt;/EM&gt; &lt;SPAN class="comment-body"&gt;). &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="comment-body"&gt;For K-Means clustering use this approach on a sample of your data to determine the max limit for k and assign it to the maxc= option in PROC FASTCLUS on the complete data.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 18 Jan 2012 12:29:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Optimal-number-of-clusters/m-p/27224#M149</guid>
      <dc:creator>Alfredo</dc:creator>
      <dc:date>2012-01-18T12:29:21Z</dc:date>
    </item>
  </channel>
</rss>

