## Mistake with the distance procedure

# Mistake with the distance procedure

Hello all,

I have been trying to produce a Bray-Curtis dissimilarity matrix for cluster analysis, but the output from proc distance does not appear to be giving me accurate values; the dissimilarities are either 0.5 or very close to that.  I have calculated the matrix 'by hand' using the formula:

...and the results I get from the hand calculations are the same as produced in the R package vegan.

Can anyone see a mistake with my SAS code or lend any other insight to help me out?

proc distance data=rawdat out=dist method=braycurtis;

var anominal (var1--var22);

id location;

run;

The rawdat are log10 transformed count data, ranging from 0 to 3.4 after transformation.

## Re: Mistake with the distance procedure

Sounds like a question for Tech Support to me.

I'm guessing and assuming you've already checked this but Your formula appears different the documentation, more like the Sorenson (Dice coeffcient) which is 1-BRAYCURTIS....

## Re: Mistake with the distance procedure

I suspect the problem resides with the log transformation (the BC distances might be calculated with integer-rounded values). Try calculating the distances with untransformed counts.

## Re: Mistake with the distance procedure

But wait! B-C requires matches... So it might be the other way around: what you are missing IS the rounding. With VARi = floor(log10(COUNTi)), you would be matching the counts that have the same order of magnitude and not the exact same value. That might be it!

A simple test :

data rawdat;
array mu(3) (100 1000 10000);
array lcount(3);
length transform \$12;
call streaminit(9876);
do id = 1 to 5;
location = put(id,2.);
transform = "Log10";
do i = 1 to 3;
lcount(i) = log10(rand("Poisson",mu(i)));
end;
output;
transform = "Floor(log10)";
do i = 1 to 3;
lcount(i) = floor(lcount(i));
end;
output;
end;
run;

proc sort data=rawdat; by transform location; run;

proc distance data=rawdat out=dist method=braycurtis;
by transform;
var anominal (lcount;
id location;
run;

proc print data=dist; run;

## Re: Mistake with the distance procedure

Thanks for the insights PG.  I ran the test program with both raw and log10 transformed and ended in the same resut:

Obs    transform       location       _1         _2         _3         _4      _5

1    Floor(log10)       1        0.00000     .          .          .          .

2    Floor(log10)       2        0.16667    0.00000     .          .          .

3    Floor(log10)       3        0.16667    0.33333    0.00000     .          .

4    Floor(log10)       4        0.00000    0.16667    0.16667    0.00000     .

5    Floor(log10)       5        0.16667    0.00000    0.33333    0.16667     0

6    Log10              1        0.00000     .          .          .          .

7    Log10              2        0.50000    0.00000     .          .          .

8    Log10              3        0.50000    0.50000    0.00000     .          .

9    Log10              4        0.50000    0.50000    0.50000    0.00000     .

10    Log10              5        0.50000    0.50000    0.50000    0.50000     0

This may very well be a question for tech support.  I have used the distance procedure for other calculations and it appears to be fine.  However, any method I try with 'anominal' type data, I have a problem similar to this, with or without transformation prior to running the procedure.

Cheers,

Ely

