BookmarkSubscribeRSS Feed
JBHUI
Calcite | Level 5

Hello - I am running a proc cluster procedure.  My data is unique based on an AppID and nonmissing, and I am specifying the outtree option with the ID = AppID.  However, the output dataset is almost double the size of my original input dataset.  I noticed in the output dataset, there are new observations created for the cluster number, where the AppID is a missing value.  I could not find any documentation that explains what is happening.  Could you please help explain this?  I would like to use proc tree to prune the clusters and this is causing errors.  Thanks.

2 REPLIES 2
Reeza
Super User

This is correct based on how it provides data - each cluster gets a line. 

 

Are you running into issues with the TREE procedure or something else?

 


@JBHUI wrote:

Hello - I am running a proc cluster procedure.  My data is unique based on an AppID and nonmissing, and I am specifying the outtree option with the ID = AppID.  However, the output dataset is almost double the size of my original input dataset.  I noticed in the output dataset, there are new observations created for the cluster number, where the AppID is a missing value.  I could not find any documentation that explains what is happening.  Could you please help explain this?  I would like to use proc tree to prune the clusters and this is causing errors.  Thanks.


 

 

JBHUI
Calcite | Level 5

Thanks Reeza.  Sorry for not thanking you sooner...I guess I am a little confused by the output in the outtree option.  Suppose I have 75 observations in my original dataset "dset" below.  Outtree produces a tree dataset with 149 observations which contains the original 75 plus additional observations for the cluster that begin with a "CL".  Based on the output of the cluster procedure, I would like to limit my data to 5 clusters, and I would like to assign the original 75 observations a value of 1 to 5 that represent the 5 clusters.  How would I go about doing this?  Thank you so much.

 

proc cluster data = dset method = ward ccc outtree = tree; 
    id AppID;
    var  x1 x2 x3 x4;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 427 views
  • 0 likes
  • 2 in conversation