BookmarkSubscribeRSS Feed
LunaMinerva
Fluorite | Level 6

Hi everyone.

 

I'm doing a cluster analysis as part of an assignment, and I want the output tree to have as the height not the standard "number of clusters", but the proportion of variance explained. I'm using the code I found here, but I can't get it to work and I keep getting only the number of clusters tree.

 

Moreover, as I want to perform different types of hierarchical clustering in order to compare the results, I put in some code to get a plot of the Rsquared against the number of clusters and get a curve for each of the methods used, but I can't get it to work.

 

What am I getting wrong? My code is as follows:

 

DATA sports;
INPUT SPORT$ END STR PWR SPD AGI FLX NER DUR HAN ANA;
DATALINES;
Soccer 2.88 4.5 3.13 1.13 1.63 2.63 2.75 2.13 6.63 3.25
Curling 5.88 3.5 2.63 1.63 2.75 1.75 9.88 4.38 8 7.5
(...)
;
PROC PRINT DATA=sports;
PROC UNIVARIATE;
PROC CLUSTER SIMPLE NOEIGEN METHOD=SIMPLE RMSSTD RSQUARE NONORM OUTTREE=singleout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC CLUSTER COMPLETE NOEIGEN METHOD=COMPLETE RMSSTD RSQUARE NONORM OUTTREE=completeout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC CLUSTER CENTROID NOEIGEN METHOD=CENTROID RMSSTD RSQUARE NONORM OUTTREE=centroidout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC CLUSTER WARD NOEIGEN METHOD=WARD RMSSTD RSQUARE NONORM OUTTREE=wardout;
id SPORT;
var END STR PWR SPD AGI FLX NER DUR HAN ANA;
proc sort; by _ncl_;
PROC TREE DATA=singleout; height _propor_;
PROC TREE DATA=completeout; height _propor_;
PROC TREE DATA=centroidout; height _propor_;
PROC TREE DATA=wardout; height _propor_;
DATA SINGLEOUT; SET SINGLEOUT; SINGLE=_RSQ_;
DATA COMPLETEOUT; SET COMPLETEOUT; COMPLETE=_RSQ_;
DATA CENTROIDOUT; SET CENTROIDOUT; CENTROID=_RSQ_;
DATA WARDOUT; SET WARDOUT; WARD=_RSQ_;
DATA OUTPUTS; MERGE SINGLEOUT COMPLETEOUT CENTROIDOUT WARDOUT; BY _NCL_;
DATA OUTPUTS; SET OUTPUTS; IF _NCL_<11;
SYMBOL1 I=JOIN V=S L=15 C=BLACK;
SYMBOL2 I=JOIN V=P L=10 C=RED;
SYMBOL3 I=JOIN V=C L=2 C=BLUE;
SYMBOL4 I=JOIN V=W L=1 C=GREEN;
PROC GPLOT DATA=OUTPUTS;
PLOT SINGLE*_NCL_=1 COMPLETE*_NCL_=2 CENTROID*_NCL_=3 WARD*_NCL_=4 /OVERLAY LEGEND;
4 REPLIES 4
WarrenKuhfeld
Rhodochrosite | Level 12

I can't answer your question off of the top of my head, but I can say you are using old technology.  If you enable ODS Graphics, PROC CLUSTER will create dendrograms.  Also, PROC SGPLOT is more modern than PROC GPLOT.

LunaMinerva
Fluorite | Level 6

Can I do those things with the SAS Studio I'm using with a virtual machine? To be honest I'm just following the code examples we were given by the professor...

WarrenKuhfeld
Rhodochrosite | Level 12

Yes.  Your professor is not up on the latest developments in SAS software.

LunaMinerva
Fluorite | Level 6

So can anyone weigh in on the actual code and help me sort out any errors? I really don't care if my procedures are outdated, it's a university course and this is the procedures the professor taught us; for the time being I just want to learn how to use SAS in general terms, I will have time to care about up-to-date procedures if I will end up using SAS in my work in the future.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1442 views
  • 0 likes
  • 2 in conversation