BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Siennayun
Calcite | Level 5
I want to use Hierarchical clustering to find the best number of clusters.

However, I am not able to get the tree plot output from the following code.

Please help me to take a look at it.









data cars;

set sashelp.cars;

run;



/*Learning from the data*/

title "2004 Car Data";

proc contents data=cars varnum;

ods select position;

run;

title "The First Five Observations Out of 428";

proc print data=cars(obs=5) noobs;

run;

title "The Type Variable";

proc freq data=cars;

tables Type;

run;

/*Standardize data*/

title "Standardize data";

proc standard data = cars mean = 0 std = 1 out= carsSTD; var _numeric_;

run;

proc print data = carsSTD (obs = 10);

run;



proc aceclus data=carsSTD out=Ace p=.03 noprint;

var MPG_Highway MSRP EngineSize Cylinders Horsepower MPG_City Weight Wheelbase Length;

run;

ods graphics on;

proc cluster data=Ace method=ward ccc pseudo print=20 out=tree

plots=den(height=rsq);

var can1-can3;

id Make;

run;

ods graphics off;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It is always important to look at the SAS log to see if there are any WARNING or ERROR messages. For your example, the log says:

WARNING: The MAXPOINTS option value 200 is less than the number of clusters (428). This may
result in a dendrogram that is difficult to read. The dendrogram will not be displayed.
You can use the PLOTS(MAXPOINTS=) option in the PROC CLUSTER statement to change this
maximum.
 
If you add the suggested change [ PLOTS(MAXPOINTS=428)=den(height=rsq); ] and rerun the analysis, then you get a new WARNING:
WARNING: The DENDROGRAM will not be drawn because the NODEID values are not unique.
This tells you that the ID variable must have unique values. For your example, the MAKE variable does not have unique values and cannot be used as an ID.  You could create a unique ID by concatenating the MAKE and MODEL variables, but the strings will be very long. Or you can create a shorter ID for each observation such as '001', '002', etc.
 
data cars;
   set sashelp.cars(obs=80);            /* use fewer obs */
   ModelMake = catx(":", make, model);  /* VERY long string! */
   MyID = put(_N_, Z3.);                /* a shorter string */
run;

/*Standardize data*/
title "Standardize data";
proc standard data=cars mean=0 std=1 out=carsSTD;
   var _numeric_;
run;

proc aceclus data=carsSTD out=Ace p=.03 ;
   var MPG_Highway MSRP EngineSize Cylinders Horsepower MPG_City Weight Wheelbase 
      Length;
run;

proc cluster data=Ace method=ward ccc pseudo print=20 out=tree 
      plots(MAXPOINTS=428)=den(height=rsq);
   var can1-can3;
   id MyID;
run;

 

 

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

It is always important to look at the SAS log to see if there are any WARNING or ERROR messages. For your example, the log says:

WARNING: The MAXPOINTS option value 200 is less than the number of clusters (428). This may
result in a dendrogram that is difficult to read. The dendrogram will not be displayed.
You can use the PLOTS(MAXPOINTS=) option in the PROC CLUSTER statement to change this
maximum.
 
If you add the suggested change [ PLOTS(MAXPOINTS=428)=den(height=rsq); ] and rerun the analysis, then you get a new WARNING:
WARNING: The DENDROGRAM will not be drawn because the NODEID values are not unique.
This tells you that the ID variable must have unique values. For your example, the MAKE variable does not have unique values and cannot be used as an ID.  You could create a unique ID by concatenating the MAKE and MODEL variables, but the strings will be very long. Or you can create a shorter ID for each observation such as '001', '002', etc.
 
data cars;
   set sashelp.cars(obs=80);            /* use fewer obs */
   ModelMake = catx(":", make, model);  /* VERY long string! */
   MyID = put(_N_, Z3.);                /* a shorter string */
run;

/*Standardize data*/
title "Standardize data";
proc standard data=cars mean=0 std=1 out=carsSTD;
   var _numeric_;
run;

proc aceclus data=carsSTD out=Ace p=.03 ;
   var MPG_Highway MSRP EngineSize Cylinders Horsepower MPG_City Weight Wheelbase 
      Length;
run;

proc cluster data=Ace method=ward ccc pseudo print=20 out=tree 
      plots(MAXPOINTS=428)=den(height=rsq);
   var can1-can3;
   id MyID;
run;

 

 

Siennayun
Calcite | Level 5
Hi Rick

Thank you so much for your detailed explanation.

It makes much sense now.



May I ask another question?



If I want to use the Hierarchical clustering to find the best number of clusters, and then set it as the K into the K-means cluster.

Which method would you recommend finding the best number in this case? Shall I use several methods, such as "centroid, single, average, ward", to get the best number from the figure of "criteria for the number of clusters" respectively? And then, choose the number appeared at the most time as the K number.

For example, the best number for single method is : 3, 8 , 11
For centroid method is: 3, 6,12
For average method is : 3, 5, 11 , 13
For ward method is: 10
In conclusion, 3 will be the best number of clusters.


Looking forward to your reply.



Thanks
Rick_SAS
SAS Super FREQ

I am not an expert on clustering, but, yes, that is essentially what I would do. For your data, there is evidence for 3, 6, and 12. If you project the data onto the first few principal components and color by the cluster number, that might help you decide. 

Siennayun
Calcite | Level 5
I really appreciate your advice.
Is it possible for you to give me some guidance on how to color the
components by the cluster number? I am not that familiar with this part.

Rick_SAS
SAS Super FREQ

When you create a scatter plot, add the GROUP= option. There are examples in the PROC CLUSTER documentation.

Siennayun
Calcite | Level 5
Great! Thank you so much!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1360 views
  • 1 like
  • 2 in conversation