turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Number of clusters from Proc Fastclus

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-16-2010 01:23 PM

I have developed 8 and 6 cluster solutions from proc fastclus. I have a manager who claims that the ratio of the average between cluster distances and the average within cluster distances might be a measure of the "best number of clusters" to consider:

Ratio = Mean Between-cluster distance / Mean within-cluster distance

Using proc fastclus and proc distance I can calculate the distances of each object to each cluster centroid, and I can calculate the distances of each cluster centroid to the other cluster centroids, but does this measure even make sense? My intuition says that an 8 cluster and 6 cluster solution are inherently incomparable, that the number of clusters by itself makes the variability of one cluster solution different from another.

Wouldn't I be better off with hierarchical clustering and using the psuedo-F statistics and the other measures found in the SAS documentation for identifying the number of clusters?

Ratio = Mean Between-cluster distance / Mean within-cluster distance

Using proc fastclus and proc distance I can calculate the distances of each object to each cluster centroid, and I can calculate the distances of each cluster centroid to the other cluster centroids, but does this measure even make sense? My intuition says that an 8 cluster and 6 cluster solution are inherently incomparable, that the number of clusters by itself makes the variability of one cluster solution different from another.

Wouldn't I be better off with hierarchical clustering and using the psuedo-F statistics and the other measures found in the SAS documentation for identifying the number of clusters?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to prooney2

11-08-2010 09:59 AM

Hello prooney2,

I am working on a similar problem and am a newbie to Cluster Analysis. I too have been told to calculate Between/Within cluster variance measures and use those to choose the best number of clusters. So, although your question sounds legitimate to me, I don't have an answer. I'm seeking help myself!

I am wondering if FASTCLUS makes the most sense for my application. I am doing a very simple clustering of one dependent variable, nonzero values, ranging from 221 to 595, n=900 observations. I'm looking for disjoint clusters in that each observation should belong to only one cluster in the end.

For this most simple application, does FASTCLUS sound like the correct procedure to use? If not, why not, and what other procedures would you recommend?

I am working on a similar problem and am a newbie to Cluster Analysis. I too have been told to calculate Between/Within cluster variance measures and use those to choose the best number of clusters. So, although your question sounds legitimate to me, I don't have an answer. I'm seeking help myself!

I am wondering if FASTCLUS makes the most sense for my application. I am doing a very simple clustering of one dependent variable, nonzero values, ranging from 221 to 595, n=900 observations. I'm looking for disjoint clusters in that each observation should belong to only one cluster in the end.

For this most simple application, does FASTCLUS sound like the correct procedure to use? If not, why not, and what other procedures would you recommend?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to prooney2

05-24-2013 10:31 AM

Hello Pronney2,

I am by no means a statistician nor a mathemitician but I am aware of a sample code shipping with IML Studio called FishClusters.sx. This code attempts to find the best number of clusters using different criterias. Maybe it can help you.

Eyal