09-24-2013 03:32 PM

i , I have a simple question I want to find the distance between clusters

I have a cluster data set

date cluster_id number

11/30/2000 1 7

12/31/2000 1 8

11/30/2000 2 6

12/31/2000 2 5

etc

Potentially 100 cluster_ids

I want to compute the euclidean distance between each cluster

**for all dates and all cluster_ids**

dist_i_j = sum(( number i - number j )^2)

The final output should look like

cluster_id with_cluster_id dist_i_j

1 2 10

2 1 10

I get the 10 finding the distance across all dates (in this example 2).

10 = (7-6)^2 + (8-5)^2 = 1+9 = 10

Thanks so much for your help!

09-24-2013 05:49 PM

Your distance measure is squared euclidean distance by cluster joining by date.

I think proc distance or proc corr could be used or proc fastclus. There's always SQL.

proc distance will work with a few extra steps.

How many clusters are you likely to have, will you know that ahead of time or is it dynamic?

09-25-2013 09:45 AM

Thanks Reeza. yes the number of clusters is known before hand in the cluster data set