Help using Base SAS procedures

Cluster or FastClus

Reply
New Contributor
Posts: 2

Cluster or FastClus

I am wanting to use either PROC CLUSTER or FASTCLUS to determine if my data can be grouped and if so what is the best grouping. A colleague ran this for me on a different stat package using k-means dynamic for 10, 8,6,4,3,2 groups and so on. He took the output and plotted #groups vs the RMSE for each. where the line inflected represented the optimal grouping. When I run FASTCLUS or CLUSTER, I don't see RMSE to do a similar check. How or what do I use in the SAS output for these PROCs to determine when the cluster numbers is the best that it can be? Is there a metric to gage this with?

Thanks.
Occasional Contributor
Posts: 10

Re: Cluster or FastClus

Hello GSRodney,

Are you still uncertain about these procedures? I am also. I am new to clustering and am also trying to "match" results using another software program. In the example I am attempting to match, several scenarios were run and the between/within cluster variance for each was calculated. Where those ratios seem to hit a point of diminishing returns (in that additional clusters does not differentiate clusters well enough anymore in comparison to the within-cluster variance), an optimal number of clusters begins to appear. Chosing the actual # of clusters is a somewhat subjective process.

BTW, the between/within ratios seem to have been calculated offline with Excel--my application involves fewer than 1,000 clustered values and only 1 dependent variable.

Anyway, if you have any additional insight on clustering analysis, measures for choosing numbers of cluster, and SAS procs, please share!

Thanks.
Occasional Contributor
Posts: 10

Re: Cluster or FastClus

Hello GSRodney,

Are you still uncertain about these procedures? I am also. I am new to clustering and am also trying to "match" results using another software program. In the example I am attempting to match, several scenarios were run and the between/within cluster variance for each was calculated. Where those ratios seem to hit a point of diminishing returns (in that additional clusters does not differentiate clusters well enough anymore in comparison to the within-cluster variance), an optimal number of clusters begins to appear. Chosing the actual # of clusters is a somewhat subjective process.

BTW, the between/within ratios seem to have been calculated offline with Excel--my application involves fewer than 1,000 clustered values and only 1 dependent variable.

Anyway, if you have any additional insight on clustering analysis, measures for choosing numbers of cluster, and SAS procs, please share!

Thanks.
Super User
Posts: 9,676

Re: Cluster or FastClus

Hi.I remebered There is likely a statistical estimator(but i forgot. Smiley Sad ) to decide how many cluster.
Before using proc cluster/fastclus ,Recommend to use proc princomp and proc gplort to plot the two prin1 and prin2 to decide how many clusters you want.
And there is not best criteria to decide the number of clusters, different method would yield different cluster .


Ksharp Message was edited by: Ksharp
Occasional Contributor
Posts: 10

Re: Cluster or FastClus

OK, now to show my ignorance (if I haven't already). I have no experience with PRINCOMP. Why to run and what do the "1" and "2" you referenced estimate?
Super User
Posts: 9,676

Re: Cluster or FastClus

Hi.
Don't say so.I am also a beginner for SAS statistical method.
proc PRINCOMP do the principle component analysis which is the oldest multi-variables analysis can use two prin stand for the multi-variables data based on covariance matrix.
Then use these two prin as x-axis and y-axis, ploting the observations in this coordination.
and you will find some obs very close and some obs very far.
Recommend you to look up the SAS documentation about proc princomp.

p.s. these two prin demonstrate the the variance this obs can explain.


Ksharp
Occasional Contributor
Posts: 10

Re: Cluster or FastClus

Thank you very much for your insights, KSharp. I will look at the SAS doc'n for PRINCOMP.
Contributor
Posts: 24

Re: Cluster or FastClus

Hi,

The stats that you want is CCC, which stands for cubic clustering criterion. Proc Clusters measures the distance between the various points and produces the CCC and Pseudo R Squares. Fastclus basically implements the K-Means Algorithm.

Regards,
Murphy
Occasional Contributor
Posts: 10

Re: Cluster or FastClus

Hello,

Can you elaborate on the CCC, and what it means? Also the Pseudo R-Square...

I thought K-means was OK for my application, but admit to some fogginess re: hierarchical vs. disjoint clustering methods.

(I chose FASTCLUS because I thought I wanted disjoint and the ease of specifying number of clusters--but better understanding doesn't mean best procedure for my simple data.)

Thank you!
Occasional Contributor
Posts: 10

Re: Cluster or FastClus

BTW, I have found SAS Technical Report A-108, Cubic Clustering Criterion, and Usage Note 22540: "How can I tell how many clusters...?" to be very useful.
Ask a Question
Discussion stats
  • 9 replies
  • 480 views
  • 0 likes
  • 4 in conversation