BookmarkSubscribeRSS Feed
kinoo1989
Calcite | Level 5
 

Hello, I have 77 variables and 27,000 observations. My goal is to find meaningful clusters out of it. I am finding it challenging to interpret the clusters!!

 

 

What I tried so far is, I performed PCA (using proc Princomp), which gave me an idea of reduced dimension. Then I used the relevant PC's in the Fastclus operations - after few iterations, I found an output that produced the desired number of significant clusters.

 

Then, I set the original input variables with the produced clusters  I did it as I thought it will enable me to make sense of the clusters in terms of the original variables, even though the PCs were used for deriving clusters.

 

My problem is how do I profile the clusters to understand their business significance (interpretation) - I tried using Proc Tabulate but it didn't make sense either because I have 77 original variables to compare with my cluster.

 

What should be the next right step - should I try to check multi-collinearity and remove as many variables I can or there is an easier way?? I would appreciate any kind of feedback or tips to resolve this issue.

 

Thank You in advance

 

Regards

Kino

1 REPLY 1
Ksharp
Super User

If you want cluster variables ,check PROC VARCLUS.

If you want pick up the most significant variables ,check PROC PLS or PROC HPGENSELECT.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1082 views
  • 0 likes
  • 2 in conversation