BookmarkSubscribeRSS Feed
thebob
Calcite | Level 5

Hi all,

I have been doing some research into different SAS procedures for segmenting data. I've look at PROC FASTCLUS and PROC CLUSTER and both produce the Cubic Clustering Criterion (CCC) and the Pseudo F Statistic (PSF).

After learning about how to pick an optimal number of clusters, etc., I wanted to compare different techniques for differing number of clusters. For FASTCLUS and CLUSTER, I can use the statistics that are produced; for others, I cannot.

So my question is has anyone attempted, and succeeded, in reproducing the CCC and/or PSF from one of these procedures with another piece of code or something created from scratch? At the moment, I am using PROC GLM (linear regression model, which is not ideal) to produce the PSF but the value is about 10% smaller each time. I've also tried creating the statistic from scratch (as per formula: F-test - Wikipedia, the free encyclopedia) but with little success. I appreciate that different METHOD values will change how this is calculated but I can't seem to get off the mark properly.

If I have placed this in the wrong place, please let me know and I shall move it.

Thanks.

1 REPLY 1
thebob
Calcite | Level 5

I've managed to recreate the PSF value in open code - it was easier than I first thought.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1097 views
  • 0 likes
  • 1 in conversation