BookmarkSubscribeRSS Feed
wutao9999
Obsidian | Level 7

I have a data set for cluster analysis. The code I used for the analysis is listed as below. My question is that: Assuming there are 100 data points in the data set, how can I define the minimum number of data points is clustered in each group?

 

Right now, I see some cluster, outputted by the current method, only contains 1 data point in a cluster.  This is why I am wondering if I can set a minimum number, such as the number of data points in a cluster must be more than 10% or 20%, etc. for example. 

 

Thank you for the help.

 

PROC FASTCLUS DATA=model_data3

MAXC=3

MAXITER=100

REPLACE=FULL

OUT=WORK.CLKMKMeansData

;

VAR volume;

RUN;

2 REPLIES 2
PGStats
Opal | Level 21

It might be a better idea to increase MAXCLUSTERS and to consider observations which end up alone in a cluster as outliers,

PG
PGStats
Opal | Level 21

Since you have a single clustering variable, you could also try to fit a finite mixture model:

 

PROC FMM DATA=model_data3;
model volume / kmax=3;
output out=CLKMKMeansData / group=volumeGroup;
run;

(untested)

 

 

PG

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1480 views
  • 0 likes
  • 2 in conversation