BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Siennayun
Calcite | Level 5
I want to use Hierarchical clustering to find the best number of clusters.

However, I am not able to get the tree plot output from the following code.

Please help me to take a look at it.









data cars;

set sashelp.cars;

run;



/*Learning from the data*/

title "2004 Car Data";

proc contents data=cars varnum;

ods select position;

run;

title "The First Five Observations Out of 428";

proc print data=cars(obs=5) noobs;

run;

title "The Type Variable";

proc freq data=cars;

tables Type;

run;

/*Standardize data*/

title "Standardize data";

proc standard data = cars mean = 0 std = 1 out= carsSTD; var _numeric_;

run;

proc print data = carsSTD (obs = 10);

run;



proc aceclus data=carsSTD out=Ace p=.03 noprint;

var MPG_Highway MSRP EngineSize Cylinders Horsepower MPG_City Weight Wheelbase Length;

run;

ods graphics on;

proc cluster data=Ace method=ward ccc pseudo print=20 out=tree

plots=den(height=rsq);

var can1-can3;

id Make;

run;

ods graphics off;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It is always important to look at the SAS log to see if there are any WARNING or ERROR messages. For your example, the log says:

WARNING: The MAXPOINTS option value 200 is less than the number of clusters (428). This may
result in a dendrogram that is difficult to read. The dendrogram will not be displayed.
You can use the PLOTS(MAXPOINTS=) option in the PROC CLUSTER statement to change this
maximum.
 
If you add the suggested change [ PLOTS(MAXPOINTS=428)=den(height=rsq); ] and rerun the analysis, then you get a new WARNING:
WARNING: The DENDROGRAM will not be drawn because the NODEID values are not unique.
This tells you that the ID variable must have unique values. For your example, the MAKE variable does not have unique values and cannot be used as an ID.  You could create a unique ID by concatenating the MAKE and MODEL variables, but the strings will be very long. Or you can create a shorter ID for each observation such as '001', '002', etc.
 
data cars;
   set sashelp.cars(obs=80);            /* use fewer obs */
   ModelMake = catx(":", make, model);  /* VERY long string! */
   MyID = put(_N_, Z3.);                /* a shorter string */
run;

/*Standardize data*/
title "Standardize data";
proc standard data=cars mean=0 std=1 out=carsSTD;
   var _numeric_;
run;

proc aceclus data=carsSTD out=Ace p=.03 ;
   var MPG_Highway MSRP EngineSize Cylinders Horsepower MPG_City Weight Wheelbase 
      Length;
run;

proc cluster data=Ace method=ward ccc pseudo print=20 out=tree 
      plots(MAXPOINTS=428)=den(height=rsq);
   var can1-can3;
   id MyID;
run;

 

 

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

It is always important to look at the SAS log to see if there are any WARNING or ERROR messages. For your example, the log says:

WARNING: The MAXPOINTS option value 200 is less than the number of clusters (428). This may
result in a dendrogram that is difficult to read. The dendrogram will not be displayed.
You can use the PLOTS(MAXPOINTS=) option in the PROC CLUSTER statement to change this
maximum.
 
If you add the suggested change [ PLOTS(MAXPOINTS=428)=den(height=rsq); ] and rerun the analysis, then you get a new WARNING:
WARNING: The DENDROGRAM will not be drawn because the NODEID values are not unique.
This tells you that the ID variable must have unique values. For your example, the MAKE variable does not have unique values and cannot be used as an ID.  You could create a unique ID by concatenating the MAKE and MODEL variables, but the strings will be very long. Or you can create a shorter ID for each observation such as '001', '002', etc.
 
data cars;
   set sashelp.cars(obs=80);            /* use fewer obs */
   ModelMake = catx(":", make, model);  /* VERY long string! */
   MyID = put(_N_, Z3.);                /* a shorter string */
run;

/*Standardize data*/
title "Standardize data";
proc standard data=cars mean=0 std=1 out=carsSTD;
   var _numeric_;
run;

proc aceclus data=carsSTD out=Ace p=.03 ;
   var MPG_Highway MSRP EngineSize Cylinders Horsepower MPG_City Weight Wheelbase 
      Length;
run;

proc cluster data=Ace method=ward ccc pseudo print=20 out=tree 
      plots(MAXPOINTS=428)=den(height=rsq);
   var can1-can3;
   id MyID;
run;

 

 

Siennayun
Calcite | Level 5
Hi Rick

Thank you so much for your detailed explanation.

It makes much sense now.



May I ask another question?



If I want to use the Hierarchical clustering to find the best number of clusters, and then set it as the K into the K-means cluster.

Which method would you recommend finding the best number in this case? Shall I use several methods, such as "centroid, single, average, ward", to get the best number from the figure of "criteria for the number of clusters" respectively? And then, choose the number appeared at the most time as the K number.

For example, the best number for single method is : 3, 8 , 11
For centroid method is: 3, 6,12
For average method is : 3, 5, 11 , 13
For ward method is: 10
In conclusion, 3 will be the best number of clusters.


Looking forward to your reply.



Thanks
Rick_SAS
SAS Super FREQ

I am not an expert on clustering, but, yes, that is essentially what I would do. For your data, there is evidence for 3, 6, and 12. If you project the data onto the first few principal components and color by the cluster number, that might help you decide. 

Siennayun
Calcite | Level 5
I really appreciate your advice.
Is it possible for you to give me some guidance on how to color the
components by the cluster number? I am not that familiar with this part.

Rick_SAS
SAS Super FREQ

When you create a scatter plot, add the GROUP= option. There are examples in the PROC CLUSTER documentation.

Siennayun
Calcite | Level 5
Great! Thank you so much!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2174 views
  • 1 like
  • 2 in conversation