BookmarkSubscribeRSS Feed
JBHUI
Calcite | Level 5

Hello - I am running a proc cluster procedure.  My data is unique based on an AppID and nonmissing, and I am specifying the outtree option with the ID = AppID.  However, the output dataset is almost double the size of my original input dataset.  I noticed in the output dataset, there are new observations created for the cluster number, where the AppID is a missing value.  I could not find any documentation that explains what is happening.  Could you please help explain this?  I would like to use proc tree to prune the clusters and this is causing errors.  Thanks.

2 REPLIES 2
Reeza
Super User

This is correct based on how it provides data - each cluster gets a line. 

 

Are you running into issues with the TREE procedure or something else?

 


@JBHUI wrote:

Hello - I am running a proc cluster procedure.  My data is unique based on an AppID and nonmissing, and I am specifying the outtree option with the ID = AppID.  However, the output dataset is almost double the size of my original input dataset.  I noticed in the output dataset, there are new observations created for the cluster number, where the AppID is a missing value.  I could not find any documentation that explains what is happening.  Could you please help explain this?  I would like to use proc tree to prune the clusters and this is causing errors.  Thanks.


 

 

JBHUI
Calcite | Level 5

Thanks Reeza.  Sorry for not thanking you sooner...I guess I am a little confused by the output in the outtree option.  Suppose I have 75 observations in my original dataset "dset" below.  Outtree produces a tree dataset with 149 observations which contains the original 75 plus additional observations for the cluster that begin with a "CL".  Based on the output of the cluster procedure, I would like to limit my data to 5 clusters, and I would like to assign the original 75 observations a value of 1 to 5 that represent the 5 clusters.  How would I go about doing this?  Thank you so much.

 

proc cluster data = dset method = ward ccc outtree = tree; 
    id AppID;
    var  x1 x2 x3 x4;
run;

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 756 views
  • 0 likes
  • 2 in conversation