Normalty transformation for cluster analysis

Occasional Contributor
Posts: 14

Normalty transformation for cluster analysis

Hi,

I am going to conduct a cluster analysis on some skewed data with K-means or hierarchical analysis method.I wonder if I need to do some variable tranformation on skewed variables to make them normal. Could someone provide some suggestion how to choose the tranformation method sicne there is no dependent varibale?

Thanks.

Super User
Posts: 11,810

Re: Normalty transformation for cluster analysis

You might look into Proc STDIZE to transform the variables before going to the cluster analysis.

But I would tend to take a quick look at the data using FASTCLUS before transforming to see if you get something interesting first. You may not need to transform if the units of measure are similar for all of the variables.

Posts: 5,049

Re: Normalty transformation for cluster analysis

If your data is clustered (with more than one cluster) then it cannot be multinormal. Transformations can help homogenize cluster covariances, but shouldn't aim at normalizing the data.

PG
Super User
Posts: 20,731

Re: Normalty transformation for cluster analysis

I think it's more important to make sure the scales are the same or comparable than for normality.

The assumptions for clustering depend on what type of clustering you intend to implement.

http://stats.stackexchange.com/questions/8148/assumptions-of-cluster-analysis

Super User
Posts: 10,210

Re: Normalty transformation for cluster analysis

Box-Cox transformation is used for Normal transformation . Check PROC TRANSREG.

TRANSREG fits univariate and multivariate linear models, optionally with spline, Box-Cox, and
other nonlinear transformations. Models include regression and ANOVA, conjoint
analysis, preference mapping, redundancy analysis, canonical correlation, and penalized
B-spline regression. PROC TRANSREG supports CLASS variables.

Occasional Contributor
Posts: 14

Re: Normalty transformation for cluster analysis

Dose Box-Cox transformation require dependent variable? However, There is no  dependent variable in my cluster analysis.

Super User
Posts: 10,210

Re: Normalty transformation for cluster analysis

No. I think it does not require dependent variable.

Interesting thing is you also can use PROC MCMC to do it .Check its example.

Example 73.2: Box-Cox Transformation

Occasional Contributor
Posts: 14