turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Cluster Analysis

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-02-2016 11:39 AM

**I have a data set for cluster analysis. The code I used for the analysis is listed as below. My question is that: Assuming there are 100 data points in the data set, how can I define the minimum number of data points is clustered in each group? **

**Right now, I see some cluster, outputted by the current method, only contains 1 data point in a cluster. This is why I am wondering if I can set a minimum number, such as the number of data points in a cluster must be more than 10% or 20%, etc. for example. **

**Thank you for the help.**

**PROC** **FASTCLUS** DATA=model_data3

MAXC=**3**

MAXITER=**100**

REPLACE=FULL

OUT=WORK.CLKMKMeansData

;

VAR volume;

**RUN**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wutao9999

11-02-2016 03:37 PM - edited 11-02-2016 03:37 PM

It might be a better idea to increase MAXCLUSTERS and to consider observations which end up alone in a cluster as outliers,

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wutao9999

11-02-2016 04:02 PM

Since you have a single clustering variable, you could also try to fit a finite mixture model:

```
PROC FMM DATA=model_data3;
model volume / kmax=3;
output out=CLKMKMeansData / group=volumeGroup;
run;
```

(untested)

PG