Hi all,

I have been doing some research into different SAS procedures for segmenting data. I've look at PROC FASTCLUS and PROC CLUSTER and both produce the Cubic Clustering Criterion (CCC) and the Pseudo F Statistic (PSF).

After learning about how to pick an optimal number of clusters, etc., I wanted to compare different techniques for differing number of clusters. For FASTCLUS and CLUSTER, I can use the statistics that are produced; for others, I cannot.

So my question is has anyone attempted, and succeeded, in reproducing the CCC and/or PSF from one of these procedures with another piece of code or something created from scratch? At the moment, I am using PROC GLM (linear regression model, which is not ideal) to produce the PSF but the value is about 10% smaller each time. I've also tried creating the statistic from scratch (as per formula: F-test - Wikipedia, the free encyclopedia) but with little success. I appreciate that different METHOD values will change how this is calculated but I can't seem to get off the mark properly.

If I have placed this in the wrong place, please let me know and I shall move it.


I've managed to recreate the PSF value in open code - it was easier than I first thought.

