BookmarkSubscribeRSS Feed
wutao9999
Obsidian | Level 7

I have a data set for cluster analysis. The code I used for the analysis is listed as below. My question is that: Assuming there are 100 data points in the data set, how can I define the minimum number of data points is clustered in each group?

 

Right now, I see some cluster, outputted by the current method, only contains 1 data point in a cluster.  This is why I am wondering if I can set a minimum number, such as the number of data points in a cluster must be more than 10% or 20%, etc. for example. 

 

Thank you for the help.

 

PROC FASTCLUS DATA=model_data3

MAXC=3

MAXITER=100

REPLACE=FULL

OUT=WORK.CLKMKMeansData

;

VAR volume;

RUN;

2 REPLIES 2
PGStats
Opal | Level 21

It might be a better idea to increase MAXCLUSTERS and to consider observations which end up alone in a cluster as outliers,

PG
PGStats
Opal | Level 21

Since you have a single clustering variable, you could also try to fit a finite mixture model:

 

PROC FMM DATA=model_data3;
model volume / kmax=3;
output out=CLKMKMeansData / group=volumeGroup;
run;

(untested)

 

 

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 988 views
  • 0 likes
  • 2 in conversation