<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Number of clusters from Proc Fastclus in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36111#M1516</link>
    <description>Hello prooney2,&lt;BR /&gt;
&lt;BR /&gt;
I am working on a similar problem and am a newbie to Cluster Analysis.  I too have been told to calculate Between/Within cluster variance measures and use those to choose the best number of clusters.  So, although your question sounds legitimate to me, I don't have an answer.  I'm seeking help myself!&lt;BR /&gt;
&lt;BR /&gt;
I am wondering if FASTCLUS makes the most sense for my application.  I am doing a very simple clustering of one dependent variable, nonzero values, ranging from 221 to 595, n=900 observations.  I'm looking for disjoint clusters in that each observation should belong to only one cluster in the end.&lt;BR /&gt;
&lt;BR /&gt;
For this most simple application, does FASTCLUS sound like the correct procedure to use?  If not, why not, and what other procedures would you recommend?</description>
    <pubDate>Mon, 08 Nov 2010 14:59:24 GMT</pubDate>
    <dc:creator>mjbstats</dc:creator>
    <dc:date>2010-11-08T14:59:24Z</dc:date>
    <item>
      <title>Number of clusters from Proc Fastclus</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36110#M1515</link>
      <description>I have developed 8 and 6 cluster solutions from proc fastclus.  I have a manager who claims that the ratio of the average between cluster distances and the average within cluster distances might be a measure of the "best number of clusters" to consider:&lt;BR /&gt;
&lt;BR /&gt;
Ratio = Mean Between-cluster distance / Mean within-cluster distance&lt;BR /&gt;
&lt;BR /&gt;
Using proc fastclus and proc distance I can calculate the distances of each object to each cluster centroid, and I can calculate the distances of each cluster centroid to the other cluster centroids, but does this measure even make sense?  My intuition says that an 8 cluster and 6 cluster solution are inherently incomparable, that the number of clusters by itself makes the variability of one cluster solution different from another.&lt;BR /&gt;
&lt;BR /&gt;
Wouldn't I be better off with hierarchical clustering and using the psuedo-F statistics and the other measures found in the SAS documentation for identifying the number of clusters?</description>
      <pubDate>Wed, 16 Jun 2010 17:23:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36110#M1515</guid>
      <dc:creator>prooney2</dc:creator>
      <dc:date>2010-06-16T17:23:19Z</dc:date>
    </item>
    <item>
      <title>Re: Number of clusters from Proc Fastclus</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36111#M1516</link>
      <description>Hello prooney2,&lt;BR /&gt;
&lt;BR /&gt;
I am working on a similar problem and am a newbie to Cluster Analysis.  I too have been told to calculate Between/Within cluster variance measures and use those to choose the best number of clusters.  So, although your question sounds legitimate to me, I don't have an answer.  I'm seeking help myself!&lt;BR /&gt;
&lt;BR /&gt;
I am wondering if FASTCLUS makes the most sense for my application.  I am doing a very simple clustering of one dependent variable, nonzero values, ranging from 221 to 595, n=900 observations.  I'm looking for disjoint clusters in that each observation should belong to only one cluster in the end.&lt;BR /&gt;
&lt;BR /&gt;
For this most simple application, does FASTCLUS sound like the correct procedure to use?  If not, why not, and what other procedures would you recommend?</description>
      <pubDate>Mon, 08 Nov 2010 14:59:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36111#M1516</guid>
      <dc:creator>mjbstats</dc:creator>
      <dc:date>2010-11-08T14:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: Number of clusters from Proc Fastclus</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36112#M1517</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello Pronney2,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am by no means a statistician nor a mathemitician but I am aware of a sample code shipping with IML Studio called FishClusters.sx. This code attempts to find the best number of clusters using different criterias. Maybe it can help you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Eyal&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 24 May 2013 14:31:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Number-of-clusters-from-Proc-Fastclus/m-p/36112#M1517</guid>
      <dc:creator>EyalGonen</dc:creator>
      <dc:date>2013-05-24T14:31:47Z</dc:date>
    </item>
  </channel>
</rss>

