I have a data set for cluster analysis. The code I used for the analysis is listed as below. My question is that: Assuming there are 100 data points in the data set, how can I define the minimum number of data points is clustered in each group?
Right now, I see some cluster, outputted by the current method, only contains 1 data point in a cluster. This is why I am wondering if I can set a minimum number, such as the number of data points in a cluster must be more than 10% or 20%, etc. for example.
Thank you for the help.
PROC FASTCLUS DATA=model_data3
MAXC=3
MAXITER=100
REPLACE=FULL
OUT=WORK.CLKMKMeansData
;
VAR volume;
RUN;
It might be a better idea to increase MAXCLUSTERS and to consider observations which end up alone in a cluster as outliers,
Since you have a single clustering variable, you could also try to fit a finite mixture model:
PROC FMM DATA=model_data3;
model volume / kmax=3;
output out=CLKMKMeansData / group=volumeGroup;
run;
(untested)
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.