Hi everyone. I'm doing a cluster analysis as part of an assignment, and I want the output tree to have as the height not the standard "number of clusters", but the proportion of variance explained. I'm using the code I found here, but I can't get it to work and I keep getting only the number of clusters tree. Moreover, as I want to perform different types of hierarchical clustering in order to compare the results, I put in some code to get a plot of the Rsquared against the number of clusters and get a curve for each of the methods used, but I can't get it to work. What am I getting wrong? My code is as follows: DATA sports;
INPUT SPORT$ END STR PWR SPD AGI FLX NER DUR HAN ANA;
DATALINES;
Soccer 2.88 4.5 3.13 1.13 1.63 2.63 2.75 2.13 6.63 3.25
Curling 5.88 3.5 2.63 1.63 2.75 1.75 9.88 4.38 8 7.5
(...)
;
PROC PRINT DATA=sports;
PROC UNIVARIATE;
PROC CLUSTER SIMPLE NOEIGEN METHOD=SIMPLE RMSSTD RSQUARE NONORM OUTTREE=singleout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC CLUSTER COMPLETE NOEIGEN METHOD=COMPLETE RMSSTD RSQUARE NONORM OUTTREE=completeout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC CLUSTER CENTROID NOEIGEN METHOD=CENTROID RMSSTD RSQUARE NONORM OUTTREE=centroidout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC CLUSTER WARD NOEIGEN METHOD=WARD RMSSTD RSQUARE NONORM OUTTREE=wardout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC TREE DATA=singleout; height _propor_;
PROC TREE DATA=completeout; height _propor_;
PROC TREE DATA=centroidout; height _propor_;
PROC TREE DATA=wardout; height _propor_;
DATA SINGLEOUT; SET SINGLEOUT; SINGLE=_RSQ_;
DATA COMPLETEOUT; SET COMPLETEOUT; COMPLETE=_RSQ_;
DATA CENTROIDOUT; SET CENTROIDOUT; CENTROID=_RSQ_;
DATA WARDOUT; SET WARDOUT; WARD=_RSQ_;
DATA OUTPUTS; MERGE SINGLEOUT COMPLETEOUT CENTROIDOUT WARDOUT; BY _NCL_;
DATA OUTPUTS; SET OUTPUTS; IF _NCL_<11;
SYMBOL1 I=JOIN V=S L=15 C=BLACK;
SYMBOL2 I=JOIN V=P L=10 C=RED;
SYMBOL3 I=JOIN V=C L=2 C=BLUE;
SYMBOL4 I=JOIN V=W L=1 C=GREEN;
PROC GPLOT DATA=OUTPUTS;
PLOT SINGLE*_NCL_=1 COMPLETE*_NCL_=2 CENTROID*_NCL_=3 WARD*_NCL_=4 /OVERLAY LEGEND;
... View more