BookmarkSubscribeRSS Feed
ANKH1
Pyrite | Level 9

Hi, 

We want to identify clusters within a data set based on one variable, use this variable as the centroid of each cluster. Is this possible?

Thanks, 

6 REPLIES 6
Reeza
Super User
That's just binning at that point with a single variable.
Do a histogram and decide where the cut points are is equivalent IMO.
ANKH1
Pyrite | Level 9
This is after running the proc cluster? We are new with this analysis.
Reeza
Super User
You have one variable that you ran PROC CLUSTER on? It's really only applicable if you have more than one variable. If you have one variable it becomes a binning problem instead and you can easily visualize it. Clustering using PCA primarily because when you start having multiple dimensions it's not possible to view it anymore so have to trust the computer to make the 'clusters'. With a single variable you can easily visualize it and do your analysis.
ANKH1
Pyrite | Level 9

We are looking at multiple variables. How many dimensions is the limit for proc cluster? Should we run a PCA before?

Reeza
Super User
I'm not aware of a limit to the number of variables, just explaining that if you had 2 or 3 three variables you're better of doing it manually. If you have more then you need to move to different methods, such as cluster analysis.

I can't comment on your methodology beyond that because I have no context to the problem you're trying to solve.

Ksharp
Super User

Try PROC FASTCLUS for k-mean cluster method .

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 640 views
  • 0 likes
  • 3 in conversation