turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Normalty transformation for cluster analysis

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-31-2016 04:49 PM

Hi,

I am going to conduct a cluster analysis on some skewed data with K-means or hierarchical analysis method.I wonder if I need to do some variable tranformation on skewed variables to make them normal. Could someone provide some suggestion how to choose the tranformation method sicne there is no dependent varibale?

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lionking19063

05-31-2016 05:00 PM

You might look into Proc STDIZE to transform the variables before going to the cluster analysis.

But I would tend to take a quick look at the data using FASTCLUS before transforming to see if you get something interesting first. You may not need to transform if the units of measure are similar for all of the variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lionking19063

05-31-2016 05:51 PM

If your data is clustered (with more than one cluster) then it cannot be multinormal. Transformations can help homogenize cluster covariances, but shouldn't aim at normalizing the data.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lionking19063

05-31-2016 05:51 PM

I think it's more important to make sure the scales are the same or comparable than for normality.

The assumptions for clustering depend on what type of clustering you intend to implement.

http://stats.stackexchange.com/questions/8148/assumptions-of-cluster-analysis

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lionking19063

05-31-2016 09:02 PM

Box-Cox transformation is used for Normal transformation . Check PROC TRANSREG.

TRANSREG fits univariate and multivariate linear models, optionally with spline, Box-Cox, and

other nonlinear transformations. Models include regression and ANOVA, conjoint

analysis, preference mapping, redundancy analysis, canonical correlation, and penalized

B-spline regression. PROC TRANSREG supports CLASS variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

05-31-2016 09:11 PM

Dose Box-Cox transformation require dependent variable? However, There is no dependent variable in my cluster analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lionking19063

05-31-2016 09:28 PM

No. I think it does not require dependent variable.

Interesting thing is you also can use PROC MCMC to do it .Check its example.

Example 73.2: Box-Cox Transformation

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lionking19063

05-31-2016 09:15 PM

I saw someone did the transformation before standardization. Not sure if it makes sense or not.