BookmarkSubscribeRSS Feed
kinoo1989
Calcite | Level 5
 

Hello, I have 77 variables and 27,000 observations. My goal is to find meaningful clusters out of it. I am finding it challenging to interpret the clusters!!

 

 

What I tried so far is, I performed PCA (using proc Princomp), which gave me an idea of reduced dimension. Then I used the relevant PC's in the Fastclus operations - after few iterations, I found an output that produced the desired number of significant clusters.

 

Then, I set the original input variables with the produced clusters  I did it as I thought it will enable me to make sense of the clusters in terms of the original variables, even though the PCs were used for deriving clusters.

 

My problem is how do I profile the clusters to understand their business significance (interpretation) - I tried using Proc Tabulate but it didn't make sense either because I have 77 original variables to compare with my cluster.

 

What should be the next right step - should I try to check multi-collinearity and remove as many variables I can or there is an easier way?? I would appreciate any kind of feedback or tips to resolve this issue.

 

Thank You in advance

 

Regards

Kino

1 REPLY 1
Ksharp
Super User

If you want cluster variables ,check PROC VARCLUS.

If you want pick up the most significant variables ,check PROC PLS or PROC HPGENSELECT.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 835 views
  • 0 likes
  • 2 in conversation