BookmarkSubscribeRSS Feed
peatjohnston
Calcite | Level 5

Hello all,

I have been trying to produce a Bray-Curtis dissimilarity matrix for cluster analysis, but the output from proc distance does not appear to be giving me accurate values; the dissimilarities are either 0.5 or very close to that.  I have calculated the matrix 'by hand' using the formula:

 BC_{ij} = \frac{2C_{ij}}{S_i + S_j}

...and the results I get from the hand calculations are the same as produced in the R package vegan.

Can anyone see a mistake with my SAS code or lend any other insight to help me out?

proc distance data=rawdat out=dist method=braycurtis;

     var anominal (var1--var22);

     id location;

run;

The rawdat are log10 transformed count data, ranging from 0 to 3.4 after transformation.

Thanks in advance.

5 REPLIES 5
Reeza
Super User

Sounds like a question for Tech Support to me.

I'm guessing and assuming you've already checked this but Your formula appears different the documentation, more like the Sorenson (Dice coeffcient) which is 1-BRAYCURTIS....

PGStats
Opal | Level 21

I suspect the problem resides with the log transformation (the BC distances might be calculated with integer-rounded values). Try calculating the distances with untransformed counts.

PG

PG
PGStats
Opal | Level 21

But wait! B-C requires matches... So it might be the other way around: what you are missing IS the rounding. With VARi = floor(log10(COUNTi)), you would be matching the counts that have the same order of magnitude and not the exact same value. That might be it!

PG

A simple test :

data rawdat;
array mu(3) (100 1000 10000);
array lcount(3);
length transform $12;
call streaminit(9876);
do id = 1 to 5;
location = put(id,2.);
transform = "Log10";
do i = 1 to 3;
  lcount(i) = log10(rand("Poisson",mu(i)));
end;
output;
transform = "Floor(log10)";
do i = 1 to 3;
  lcount(i) = floor(lcount(i));
end;
output;
end;
run;

proc sort data=rawdat; by transform location; run;

proc distance data=rawdat out=dist method=braycurtis;
by transform;
     var anominal (lcount:);
     id location;
run;

proc print data=dist; run;

PG

Message was edited to include test.

PG
peatjohnston
Calcite | Level 5

Thanks for the insights PG.  I ran the test program with both raw and log10 transformed and ended in the same resut:

Obs    transform       location       _1         _2         _3         _4      _5

          1    Floor(log10)       1        0.00000     .          .          .          .

          2    Floor(log10)       2        0.16667    0.00000     .          .          .

          3    Floor(log10)       3        0.16667    0.33333    0.00000     .          .

          4    Floor(log10)       4        0.00000    0.16667    0.16667    0.00000     .

          5    Floor(log10)       5        0.16667    0.00000    0.33333    0.16667     0

          6    Log10              1        0.00000     .          .          .          .

          7    Log10              2        0.50000    0.00000     .          .          .

          8    Log10              3        0.50000    0.50000    0.00000     .          .

          9    Log10              4        0.50000    0.50000    0.50000    0.00000     .

         10    Log10              5        0.50000    0.50000    0.50000    0.50000     0

This may very well be a question for tech support.  I have used the distance procedure for other calculations and it appears to be fine.  However, any method I try with 'anominal' type data, I have a problem similar to this, with or without transformation prior to running the procedure.

Cheers,

Ely

PGStats
Opal | Level 21

I don't think there is a bug. The distances are fine with the rounded values (upper matrix). The BC distances is a measure of match-mismatch. If your counts are relatively large, they almost never match, hence the 0.5 distance. The rounded log10 counts will match when the counts are the same order of magnitude, which makes more sense.

PG

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1662 views
  • 0 likes
  • 3 in conversation