In Proc cluster, how to know an observation was assigned in which clus...

zzdwcrzlj1 · Posted 03-08-2022 08:31 PM

Hello everyone,

I'm trying to do a clustering analysis. Here is my code and output from PROC CLUSTER:

PROC CLUSTER Data=Dist METHOD=average OUTTREE=tree PSEUDO print=15;
id record_id_char;
RUN;

I have 240 observations, and my question is if I choose 4 clusters, how do I know each observation was assigned to which clusters? I'd like to create a new categorical variable called "cluster" which contains this information for each observation.

As you can see there is an output dataset called "tree" from the above code. The first 10 print of "tree" is below:

It shows how each observation was assigned. But it is hard to track one by one. Is there a way to get this information from the output? Please help.

Thanks in advance,

Kenny

Reeza · Posted 03-08-2022 09:44 PM

I think you need to use PROC TREE to get that info.

zzdwcrzlj1 · Posted 03-08-2022 10:26 PM

Hi Reeza,

I did use the proc tree. The plot is like below. Since I have too many obs, this is not very informative.

Reeza · Posted 03-08-2022 10:59 PM

Did you check the output dataset from PROC TREE?

ballardw · Posted 03-08-2022 11:00 PM

You may want to consider a different cluster procedure if you want cluster identification added to existing data. Here is a small example you can run as you should have the SASHELP.CLASS data set.

proc fastclus data=sashelp.class maxclusters=3 out=want ;
   var height weight;
run;

The output data set, want, has all of the variables from the SASHELP.CLASS data set plus two variables, Cluster - cluster assigned, and Distance a measure used in the assignment of cluster. There are different options involved in specifying the cluster building.Note that Maxclusters sets the maximum number of clusters which I picked just as an example as 3 because the sample data set is small and just wanted to show something that may be viable option.

FASTCLUS is designed to work with large data sets so may not be quite as precise for smaller sets. It also may work better with standardized data

In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Re: In Proc cluster, how to know an observation was assigned in which cluster?

Registration is open

SAS Training: Just a Click Away