For example, there is a data set like this:
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14
F1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
F2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
F3 0 0 1 0 0 0 0 0 0 0 0 0 1 0
F4 0 0 1 1 0 0 0 0 0 0 0 0 1 0
F5 0 0 0 0 1 1 1 0 0 0 0 0 0 0
F6 0 0 0 0 0 1 1 1 0 0 0 0 0 0
F7 0 0 0 0 0 0 0 0 1 1 1 0 0 0
F8 0 0 0 0 0 0 0 0 0 0 1 1 0 0
F9 0 0 0 0 0 0 0 0 0 0 1 1 0 0
F10 0 0 0 0 0 0 0 0 0 0 0 0 1 1
How to transform this data matrix into dissimilar matrix through Jaccard index?
Then calculation the distance between the two of F1-F10 ? How to calculate the distance matrix?
Based on these, I want to do cluster analysis among F1-F10.
I‘m a beginner. I really want to know how to programme it.
Thank you very much!
The Jaccard index is a similarity measure. For clustering, you need a dissimilarity measure (a distance) such as DJACCARD or Bray-Curtis. You can check the definitions in the SAS doc at :
or in the reference :
Legendre, Pierre & Louis Legendre. 1998. Numerical ecology. 2nd English
edition. Elsevier Science BV, Amsterdam.
xv + 853 pages
Here is how to do it in SAS:
data test;
input id $ M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14;
datalines;
F1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
F2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
F3 0 0 1 0 0 0 0 0 0 0 0 0 1 0
F4 0 0 1 1 0 0 0 0 0 0 0 0 1 0
F5 0 0 0 0 1 1 1 0 0 0 0 0 0 0
F6 0 0 0 0 0 1 1 1 0 0 0 0 0 0
F7 0 0 0 0 0 0 0 0 1 1 1 0 0 0
F8 0 0 0 0 0 0 0 0 0 0 1 1 0 0
F9 0 0 0 0 0 0 0 0 0 0 1 1 0 0
F10 0 0 0 0 0 0 0 0 0 0 0 0 1 1
;
proc distance data=test method= /*BRAYCURTIS*/ DJACCARD out=testDist;
var anominal(M: / absent=0); /* M: means all variable names starting with M */
id id;
run;
proc cluster method=AVERAGE data=testDist outtree=testTree print=0;
ID id;
run;
The CLUSTER procedure will give you a dendrogram by default and you can use the testTree dataset as input to PROC TREE for further manipulation.
PG
Thanks for your help! I think you give me power to learn it. Best wishes!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.