turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Interpreting negative CCC values in a Cluster Anal...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2012 07:54 PM

I understand the idea od the CCC is to compare the R^{2} you get for a given set of clusters with the R^{2} you would get by clustering a unfoirmly distributed set of points in a* p* dimensional space. However what if I get negative values in the CCC plot but the peaks in the CCC plot still indicate a number of clusters that explains a good deal of variation (as evidenced by the corresponding R^{2 }value with that number of clusters in the Cluster History table)? Please advise. Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-18-2012 06:45 AM

The CCC is a statistic created by Warren Kuhfeld of SAS nearly 30 years ago. It is documented in Technical Report A-108. On page 48 he writes, "If all values of the CCC are negative and decreasing for two or more clusters, the distribution is probably unimodal or long-tailed." He goes on to say that very negative values may be due to outliers, which he recommends removing (not my recommended best practice). In my experience, the CCC is a heuristic that needs to be triangulated with the approximate R2 as well as the distribution of the cluster frequencies. For the CCC and R2, you want to look at their distribution across a set of solutions (e.g., wrap FASTCLUS in a macro and run solutions from 3 to 30) and examine solutions that have max values for those statistics, even when the CCC is negative. Clusters that are highly irregularly distributed or have 1 or 2 clusters that are large with several small clusters are not appropriate and do not lead to good solutions. In addition, it's important to note that FASTCLUS is a k-means algorithm, meaning that the clusters it produces are compact and spherical in shape. If the shape of your clusters is irregular, you may want to consider a different algorithm, e.g., a nonparametric approach.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-18-2012 10:06 AM

Small correction: The CCC statistic is based on research by Warren Sarle, not Warren Kuhfeld.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-19-2013 02:11 PM

I always confuse those two myself.