BookmarkSubscribeRSS Feed
Lobbie
Obsidian | Level 7

Hi,

 

I built 4 clustering models i.e. 3 manually and stepping down from K15 -> K6 -> K4 and 1 using automatic selection with the Cluster node in SAS Enterprise Miner.  The cluster statistics for the 4 models are,

 

2017-04-27_7-48-51.png

 

The results are the exactly the same for Clustering K4 and Clustering Auto.  I have come to determine that a 4 clusters model is optimum.

  1. Are these the correct metrics to evaluate clusters and to determine the optimal number of K?  I used cluster distance plots to visually determine as well.
  2. Pseudo_F:  Is this the higher the better?
  3. RSQ and RSQ_Ratio:  Are these the lower the better?
  4. If these 4 metrics are not the best metrics to determine the optimal number of clusters, what are the appropriate ones generated from the Clustering node in SAS EM?

Thanks,

Lobbie

 

 

1 REPLY 1
trees1
SAS Employee

Hi Lobbie, see below for some comments around these.

 

 

  1. Think these are fine as a guide, but suggest a little trial an error here - you also want the clusters to fit the purpose, not just the best from a statistical sense. So you can also play around with which variables to use, and profiling to get a sense of the solution (can use the segment profile node here).  This gives some more detail around approaches to selecting the no. of clusters:  https://v8doc.sas.com/sashtml/stat/chap8/sect10.htm
  2. Yes, it measures the separation of the clusters, so higher is better
  3.  It's the higher the better for both.  RSQ this is the proportion of variance accounted for in the data, and RSQ_Ratio is similar but takes into account within vs between cluster variance.  These will keep increasing to a maximum where the number of clusters = the numbers of cases, so you're not looking for the higheset but actually an inflection point where the rate of increase is small
  4.  Also try looking at the CCC plot and see if there's some levelling here.

Cheers,

Troy

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1391 views
  • 0 likes
  • 2 in conversation