topic Re: Number of clusters from Proc Fastclus in Statistical Procedures

Number of clusters from Proc Fastclus

prooney2 — Wed, 16 Jun 2010 17:23:19 GMT

I have developed 8 and 6 cluster solutions from proc fastclus. I have a manager who claims that the ratio of the average between cluster distances and the average within cluster distances might be a measure of the "best number of clusters" to consider:

Ratio = Mean Between-cluster distance / Mean within-cluster distance

Using proc fastclus and proc distance I can calculate the distances of each object to each cluster centroid, and I can calculate the distances of each cluster centroid to the other cluster centroids, but does this measure even make sense? My intuition says that an 8 cluster and 6 cluster solution are inherently incomparable, that the number of clusters by itself makes the variability of one cluster solution different from another.

Wouldn't I be better off with hierarchical clustering and using the psuedo-F statistics and the other measures found in the SAS documentation for identifying the number of clusters?

Re: Number of clusters from Proc Fastclus

mjbstats — Mon, 08 Nov 2010 14:59:24 GMT

Hello prooney2,

I am working on a similar problem and am a newbie to Cluster Analysis. I too have been told to calculate Between/Within cluster variance measures and use those to choose the best number of clusters. So, although your question sounds legitimate to me, I don't have an answer. I'm seeking help myself!

I am wondering if FASTCLUS makes the most sense for my application. I am doing a very simple clustering of one dependent variable, nonzero values, ranging from 221 to 595, n=900 observations. I'm looking for disjoint clusters in that each observation should belong to only one cluster in the end.

For this most simple application, does FASTCLUS sound like the correct procedure to use? If not, why not, and what other procedures would you recommend?

Re: Number of clusters from Proc Fastclus

EyalGonen — Fri, 24 May 2013 14:31:47 GMT

Hello Pronney2,

I am by no means a statistician nor a mathemitician but I am aware of a sample code shipping with IML Studio called FishClusters.sx. This code attempts to find the best number of clusters using different criterias. Maybe it can help you.

Eyal