Hi,
I am going to conduct a cluster analysis on some skewed data with K-means or hierarchical analysis method.I wonder if I need to do some variable tranformation on skewed variables to make them normal. Could someone provide some suggestion how to choose the tranformation method sicne there is no dependent varibale?
Thanks.
You might look into Proc STDIZE to transform the variables before going to the cluster analysis.
But I would tend to take a quick look at the data using FASTCLUS before transforming to see if you get something interesting first. You may not need to transform if the units of measure are similar for all of the variables.
If your data is clustered (with more than one cluster) then it cannot be multinormal. Transformations can help homogenize cluster covariances, but shouldn't aim at normalizing the data.
I think it's more important to make sure the scales are the same or comparable than for normality.
The assumptions for clustering depend on what type of clustering you intend to implement.
http://stats.stackexchange.com/questions/8148/assumptions-of-cluster-analysis
Box-Cox transformation is used for Normal transformation . Check PROC TRANSREG.
TRANSREG fits univariate and multivariate linear models, optionally with spline, Box-Cox, and
other nonlinear transformations. Models include regression and ANOVA, conjoint
analysis, preference mapping, redundancy analysis, canonical correlation, and penalized
B-spline regression. PROC TRANSREG supports CLASS variables.
Dose Box-Cox transformation require dependent variable? However, There is no dependent variable in my cluster analysis.
No. I think it does not require dependent variable.
Interesting thing is you also can use PROC MCMC to do it .Check its example.
Example 73.2: Box-Cox Transformation
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.