turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Learn SAS
- /
- Analytics U
- /
- How do you compare different methods when performi...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-05-2014 03:12 PM

How do you compare different methods when performing cluster analysis in SAS? Is there a statistic that tells you how the model performs?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to nnaeem

05-02-2016 08:45 AM

Unfortunately there is no single test statistic that will do that. I advise my students to use hierarchical cluster models to settle on a reasonable number of clusters but then use a non-hierarchical method to produce a better cluster solution for that given number of clusters. It is hard to know what the 'right' number of clusters is, but you can usually recognise a useful cluster solution when you profile clusters by other, non-basis, observed variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to nnaeem

03-12-2017 07:58 PM - edited 03-12-2017 08:15 PM

As @Damien_Mather said, there´s no easy solution. In fact, thare are many strategies and methods to try on. For example, you can use proc cluster based on each of the distances available in proc distance, or, if you have a very big dataset (variables), first perform a factor analysis to reduce the number of columns and make things simpler and faster, specially with SAS Studio, that is a solution for learning purposes and can´t handle very big datasets. You may try the different clustering methods also, and when you "cross" distances available in SAS with the different methods in proc cluster things go for a higher dimension of analysis, because you have to manually evaluate each solution found, and this one is the painfull part.

So first things first: look at your variables and see if you can reduce them to a manageable set, ie, grouping them into factors. Then look for different distances and methods that apply to your data and run cluster analysis using different strategies: as I said, using proc cluster, or ace cluster + fast cluster + proc cluster, it all depends on the nature of your data and purpose of your analysis. Evaluate and find the final solution.

Now, why things get hard? Because, for example, for each - each - distance available that you test for cluster analysis (considering you´re trying just one strategy), you have to try different number of clusters, and after that, evaluate number of observations in each cluster, cluster composition and separation from other clusters and the variables that work as drivers in order to meaningfully name them.

Then, with this information in hands, you go for the final solution by yourself if you now well the bussiness from wich the data come from, or you present two or three possible solutions for the ones that have this knowledge. They will point out a solution and better understanting.

Hope this helps.