turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- How to preserve Cluster numbers while using pro fa...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2016 08:44 AM

Hallo,

I am facing a issue with proc fastclust. I want to have 5 clusters and I can get them very well. The problem is: If I run the same procedure on two different datasets which has actually the same data, I get different numbers for clusters although the behaviour clousters are the same.

For example:

A cluster, which was numbered 2 in first run, is numbered 3 in the second run. Using Profiling I can see the fact that cluster 2 in first run is equivalent to cluster 3 in second run.

Do any one has Idea how I can preserve the cluster numbers ?

Thanks in advance

Ehsan

Accepted Solutions

Solution

06-11-2016
11:00 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2016 10:36 AM

If the clustering is really the same, then you can do the following:

1. From the first run you can use the OUTSTAT= option to output the centers. Call the centers

CA_1, CA_2, .., CA_k.

2. From the second run you can use the OUTSTAT= option to output the centers. Call the centers

CB_1, CB_2,..., CB_k.

3. Concatenate the centers into a single data set and use PROC DISTANCE to compute the distance between centers.

4. The first k columns and the last k rows represent the distance between the centers in each run. The smallest elemtn in each column tells you which center in Run A mathch up with which cetners in Run B.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2016 09:00 AM

Do the datasets have the same order?

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2016 11:20 AM

Actually, order does not matter here. What really matters is the "time period". I have one dataset from May2016 and the second Dataset from Jun 2016. Since the (customer) data comes from the same source. Also the experiementation shows that I can always finde the same clusters but with different numbers.

Solution

06-11-2016
11:00 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2016 10:36 AM

If the clustering is really the same, then you can do the following:

1. From the first run you can use the OUTSTAT= option to output the centers. Call the centers

CA_1, CA_2, .., CA_k.

2. From the second run you can use the OUTSTAT= option to output the centers. Call the centers

CB_1, CB_2,..., CB_k.

3. Concatenate the centers into a single data set and use PROC DISTANCE to compute the distance between centers.

4. The first k columns and the last k rows represent the distance between the centers in each run. The smallest elemtn in each column tells you which center in Run A mathch up with which cetners in Run B.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-11-2016 11:01 AM

Thanks It seems to work :-)