Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Cluster Analysis

Reply
Contributor
Posts: 52

Cluster Analysis

I have a data set for cluster analysis. The code I used for the analysis is listed as below. My question is that: Assuming there are 100 data points in the data set, how can I define the minimum number of data points is clustered in each group?

 

Right now, I see some cluster, outputted by the current method, only contains 1 data point in a cluster.  This is why I am wondering if I can set a minimum number, such as the number of data points in a cluster must be more than 10% or 20%, etc. for example. 

 

Thank you for the help.

 

PROC FASTCLUS DATA=model_data3

MAXC=3

MAXITER=100

REPLACE=FULL

OUT=WORK.CLKMKMeansData

;

VAR volume;

RUN;

Respected Advisor
Posts: 4,930

Re: Cluster Analysis

[ Edited ]
Posted in reply to wutao9999

It might be a better idea to increase MAXCLUSTERS and to consider observations which end up alone in a cluster as outliers,

PG
Respected Advisor
Posts: 4,930

Re: Cluster Analysis

Posted in reply to wutao9999

Since you have a single clustering variable, you could also try to fit a finite mixture model:

 

PROC FMM DATA=model_data3;
model volume / kmax=3;
output out=CLKMKMeansData / group=volumeGroup;
run;

(untested)

 

 

PG
Ask a Question
Discussion stats
  • 2 replies
  • 245 views
  • 0 likes
  • 2 in conversation