11-05-2014 03:12 PM
How do you compare different methods when performing cluster analysis in SAS? Is there a statistic that tells you how the model performs?
05-02-2016 08:45 AM
03-12-2017 07:58 PM - edited 03-12-2017 08:15 PM
As @Damien_Mather said, there´s no easy solution. In fact, thare are many strategies and methods to try on. For example, you can use proc cluster based on each of the distances available in proc distance, or, if you have a very big dataset (variables), first perform a factor analysis to reduce the number of columns and make things simpler and faster, specially with SAS Studio, that is a solution for learning purposes and can´t handle very big datasets. You may try the different clustering methods also, and when you "cross" distances available in SAS with the different methods in proc cluster things go for a higher dimension of analysis, because you have to manually evaluate each solution found, and this one is the painfull part.
So first things first: look at your variables and see if you can reduce them to a manageable set, ie, grouping them into factors. Then look for different distances and methods that apply to your data and run cluster analysis using different strategies: as I said, using proc cluster, or ace cluster + fast cluster + proc cluster, it all depends on the nature of your data and purpose of your analysis. Evaluate and find the final solution.
Now, why things get hard? Because, for example, for each - each - distance available that you test for cluster analysis (considering you´re trying just one strategy), you have to try different number of clusters, and after that, evaluate number of observations in each cluster, cluster composition and separation from other clusters and the variables that work as drivers in order to meaningfully name them.
Then, with this information in hands, you go for the final solution by yourself if you now well the bussiness from wich the data come from, or you present two or three possible solutions for the ones that have this knowledge. They will point out a solution and better understanting.
Hope this helps.