- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I create a n clusters using SAS miner HP cluster nodes( K means ). but every time I try to replicate the same clusters it give a different clusters. even using EG with different initializations give me different clusters. my questions are :
1- Is there a way to fix this clusters and make it my work replicable?
2- If I can't fixe my clusters is there a way to test the stability of my clusters using for example an overlap rate and said after 75 % we can said that the clusters are stable?
3- I couldn't find any straight forward answer for the stability of the clusters and how it's important. can we speak about the stability of clustering in this situation? is it very important to test the stability before use the clusters? which measures can do that ? is there any nodes in sas miner can do that?
I'm a little bit lost with this question of the stability. thank you for your understanding !!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
K-Means clustering doesn't have a single unique solution, more so, there's a set of possible solutions and it's about picking one that makes the most sense for your use case. Especially if you change the initialization parameters then the clusters will be different.
If your clusters are unstable it means your clusters are possibly not unique enough and you should reduce the number of clusters to get a more stable solution. How did you pick the number of clusters?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For the # of clusters, did you look at the graphs and use the elbow method to determine the optimal # of clusters?
And just as an FYI stability isn't always possible in a clustering model and you'll almost never get 100% stability with real data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
here's a picture of selection # of clusters using ABC selection
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Estimation Criterion — specifies the estimation criterion used in the aligned box criterion method. Global Peak Value uses the maximum peak value across all peak values in the gap statistics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content