I have developed 8 and 6 cluster solutions from proc fastclus. I have a manager who claims that the ratio of the average between cluster distances and the average within cluster distances might be a measure of the "best number of clusters" to consider:
Ratio = Mean Between-cluster distance / Mean within-cluster distance
Using proc fastclus and proc distance I can calculate the distances of each object to each cluster centroid, and I can calculate the distances of each cluster centroid to the other cluster centroids, but does this measure even make sense? My intuition says that an 8 cluster and 6 cluster solution are inherently incomparable, that the number of clusters by itself makes the variability of one cluster solution different from another.
Wouldn't I be better off with hierarchical clustering and using the psuedo-F statistics and the other measures found in the SAS documentation for identifying the number of clusters?
... View more