BookmarkSubscribeRSS Feed
kinoo1989
Calcite | Level 5
 

Hello, I have 77 variables and 27,000 observations. My goal is to find meaningful clusters out of it. I am finding it challenging to interpret the clusters!!

 

 

What I tried so far is, I performed PCA (using proc Princomp), which gave me an idea of reduced dimension. Then I used the relevant PC's in the Fastclus operations - after few iterations, I found an output that produced the desired number of significant clusters.

 

Then, I set the original input variables with the produced clusters  I did it as I thought it will enable me to make sense of the clusters in terms of the original variables, even though the PCs were used for deriving clusters.

 

My problem is how do I profile the clusters to understand their business significance (interpretation) - I tried using Proc Tabulate but it didn't make sense either because I have 77 original variables to compare with my cluster.

 

What should be the next right step - should I try to check multi-collinearity and remove as many variables I can or there is an easier way?? I would appreciate any kind of feedback or tips to resolve this issue.

 

Thank You in advance

 

Regards

Kino

1 REPLY 1
Ksharp
Super User

If you want cluster variables ,check PROC VARCLUS.

If you want pick up the most significant variables ,check PROC PLS or PROC HPGENSELECT.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1142 views
  • 0 likes
  • 2 in conversation