Hello everyone,
I'm trying to do a clustering analysis. Here is my code and output from PROC CLUSTER:
PROC CLUSTER Data=Dist METHOD=average OUTTREE=tree PSEUDO print=15;
id record_id_char;
RUN;
I have 240 observations, and my question is if I choose 4 clusters, how do I know each observation was assigned to which clusters? I'd like to create a new categorical variable called "cluster" which contains this information for each observation.
As you can see there is an output dataset called "tree" from the above code. The first 10 print of "tree" is below:
It shows how each observation was assigned. But it is hard to track one by one. Is there a way to get this information from the output? Please help.
Thanks in advance,
Kenny
Hi Reeza,
I did use the proc tree. The plot is like below. Since I have too many obs, this is not very informative.
You may want to consider a different cluster procedure if you want cluster identification added to existing data. Here is a small example you can run as you should have the SASHELP.CLASS data set.
proc fastclus data=sashelp.class maxclusters=3 out=want ; var height weight; run;
The output data set, want, has all of the variables from the SASHELP.CLASS data set plus two variables, Cluster - cluster assigned, and Distance a measure used in the assignment of cluster. There are different options involved in specifying the cluster building.Note that Maxclusters sets the maximum number of clusters which I picked just as an example as 3 because the sample data set is small and just wanted to show something that may be viable option.
FASTCLUS is designed to work with large data sets so may not be quite as precise for smaller sets. It also may work better with standardized data
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.