BookmarkSubscribeRSS Feed
lionking19063
Fluorite | Level 6

Hi,

 

I am going to conduct a cluster analysis on some skewed data with K-means or hierarchical analysis method.I wonder if I need to do some variable tranformation on skewed variables to make them normal. Could someone provide some suggestion how to choose the tranformation method sicne there is no dependent varibale?

 

Thanks.

 

 

7 REPLIES 7
ballardw
Super User

You might look into Proc STDIZE to transform the variables before going to the cluster analysis.

 

But I would tend to take a quick look at the data using FASTCLUS before transforming to see if you get something interesting first. You may not need to transform if the units of measure are similar for all of the variables.

PGStats
Opal | Level 21

If your data is clustered (with more than one cluster) then it cannot be multinormal. Transformations can help homogenize cluster covariances, but shouldn't aim at normalizing the data. 

PG
Reeza
Super User

I think it's more important to make sure the scales are the same or comparable than for normality. 

 

The assumptions for clustering depend on what type of clustering you intend to implement. 

 

http://stats.stackexchange.com/questions/8148/assumptions-of-cluster-analysis

 

Ksharp
Super User

Box-Cox transformation is used for Normal transformation . Check PROC TRANSREG.

 

TRANSREG fits univariate and multivariate linear models, optionally with spline, Box-Cox, and
other nonlinear transformations. Models include regression and ANOVA, conjoint
analysis, preference mapping, redundancy analysis, canonical correlation, and penalized
B-spline regression. PROC TRANSREG supports CLASS variables. 

 

lionking19063
Fluorite | Level 6

Dose Box-Cox transformation require dependent variable? However, There is no  dependent variable in my cluster analysis.

Ksharp
Super User

No. I think it does not require dependent variable.

Interesting thing is you also can use PROC MCMC to do it .Check its example.

Example 73.2: Box-Cox Transformation

lionking19063
Fluorite | Level 6
I saw someone did the transformation before standardization. Not sure if it makes sense or not.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1668 views
  • 0 likes
  • 5 in conversation