- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
It's my first encounter with the CCC. I'm trying to figure out the outflow model. I am a beginner and met this clustering assessment. Can you explain in simple terms how best to interpret this estimate?
I'm not very good at English specialized literature, find SAS TR A-108, but can't understand main point.
Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello DavidBesaev -
Here is a link to the technical report that you mentioned.
A-108 Cubic Clustering Criterion
http://support.sas.com/kb/22/addl/fusion_22540_1_a108_5903.pdf
The best place to look for information about how to interpret is in the Conclusion section, printed page 49. Here is a brief summarization:
- Peaks in the plot of the cubic clustering criterion with values greater than 2 or 3 indicate good clusters;
- Peaks with values between 0 and 2 indicate possible clusters.
- Large negative values of the CCC can indicate outliers.
Pages 40-48 give some examples of interpretations.
Another good place to look for interpretation examples is the Getting Started section, and the Examples section, of the chapter The CLUSTER Procedure.
SAS/STAT User's Guide - Procedures
https://support.sas.com/documentation/onlinedoc/stat/indexproc.html
Have a great day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello DavidBesaev -
Here is a link to the technical report that you mentioned.
A-108 Cubic Clustering Criterion
http://support.sas.com/kb/22/addl/fusion_22540_1_a108_5903.pdf
The best place to look for information about how to interpret is in the Conclusion section, printed page 49. Here is a brief summarization:
- Peaks in the plot of the cubic clustering criterion with values greater than 2 or 3 indicate good clusters;
- Peaks with values between 0 and 2 indicate possible clusters.
- Large negative values of the CCC can indicate outliers.
Pages 40-48 give some examples of interpretations.
Another good place to look for interpretation examples is the Getting Started section, and the Examples section, of the chapter The CLUSTER Procedure.
SAS/STAT User's Guide - Procedures
https://support.sas.com/documentation/onlinedoc/stat/indexproc.html
Have a great day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for your help, MikeStockstill !
I will try to get to know these sources more closely.
So, if we look at my plot of CCC, that good performance will be, if there are more than 4 clusters ( more 2000 points)?
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, if we look at my plot of CCC, that good performance will be, if there are more than 4 clusters ( more 2000 points)?
It is best to review the report since the CCC is just one way to evaluate a candidate number of clusters, and there are situation where the CCC might not be the best statstic to use. The goal of clustering is typically to provide interpretable and/or usable results for your analysis needs. Think of the CCC plot as recommending a range of cluster solutions that might be useful and you can then compare the competing solutions for which one best meets those needs.
When I see the CCC increasing slowly over the larger number of clusters, I would expect the additional splits to be pulling off clusters with small numbers of observations which can be useful if you are trying to isolate unusual potentially fraudulent cases but is not helpful if you are doing marketing where small clusters are not large enough to warrant special treatment. Check multiple cluster solutions and choose what is best for your scenario.
Hope this helps!
Doug