distance between two clusters

distance between two clusters

i , I have a simple question I want to find the distance between clusters

I have a cluster data set

date                 cluster_id         number

11/30/2000        1         7

12/31/2000        1         8

11/30/2000        2         6

12/31/2000       2           5


Potentially 100 cluster_ids

I want to compute the euclidean distance between each cluster

for all dates and all cluster_ids

dist_i_j = sum(( number i - number j )^2)

The final output should look like

  cluster_id  with_cluster_id  dist_i_j

     1              2                     10

     2             1                    10

I get the 10 finding the distance across all dates (in this example 2).

10 = (7-6)^2 + (8-5)^2 = 1+9 = 10
Thanks so much for your help!

Re: distance between two clusters

Your distance measure is squared euclidean distance by cluster joining by date. 

I think proc distance or proc corr could be used or proc fastclus. There's always SQL.

proc distance will work with a  few extra steps.

How many clusters are you likely to have, will you know that ahead of time or is it dynamic?

Re: distance between two clusters

Thanks Reeza. yes the number of clusters is known before hand in the cluster data set

