Hi!
It's my first encounter with the CCC. I'm trying to figure out the outflow model. I am a beginner and met this clustering assessment. Can you explain in simple terms how best to interpret this estimate?
I'm not very good at English specialized literature, find SAS TR A-108, but can't understand main point.
Thank you!
Hello DavidBesaev -
Here is a link to the technical report that you mentioned.
http://support.sas.com/kb/22/addl/fusion_22540_1_a108_5903.pdf
The best place to look for information about how to interpret is in the Conclusion section, printed page 49. Here is a brief summarization:
Pages 40-48 give some examples of interpretations.
Another good place to look for interpretation examples is the Getting Started section, and the Examples section, of the chapter The CLUSTER Procedure.
SAS/STAT User's Guide - Procedures
https://support.sas.com/documentation/onlinedoc/stat/indexproc.html
Have a great day.
Hello DavidBesaev -
Here is a link to the technical report that you mentioned.
http://support.sas.com/kb/22/addl/fusion_22540_1_a108_5903.pdf
The best place to look for information about how to interpret is in the Conclusion section, printed page 49. Here is a brief summarization:
Pages 40-48 give some examples of interpretations.
Another good place to look for interpretation examples is the Getting Started section, and the Examples section, of the chapter The CLUSTER Procedure.
SAS/STAT User's Guide - Procedures
https://support.sas.com/documentation/onlinedoc/stat/indexproc.html
Have a great day.
Thank you so much for your help, MikeStockstill !
I will try to get to know these sources more closely.
So, if we look at my plot of CCC, that good performance will be, if there are more than 4 clusters ( more 2000 points)?
Thank you!
So, if we look at my plot of CCC, that good performance will be, if there are more than 4 clusters ( more 2000 points)?
It is best to review the report since the CCC is just one way to evaluate a candidate number of clusters, and there are situation where the CCC might not be the best statstic to use. The goal of clustering is typically to provide interpretable and/or usable results for your analysis needs. Think of the CCC plot as recommending a range of cluster solutions that might be useful and you can then compare the competing solutions for which one best meets those needs.
When I see the CCC increasing slowly over the larger number of clusters, I would expect the additional splits to be pulling off clusters with small numbers of observations which can be useful if you are trying to isolate unusual potentially fraudulent cases but is not helpful if you are doing marketing where small clusters are not large enough to warrant special treatment. Check multiple cluster solutions and choose what is best for your scenario.
Hope this helps!
Doug
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.