I'm performing a cluster analysis with a dateset that contain over 100 varibles (after imputing, replacing and eliminating correlated vars ..) Before hiting the clustering, for the transformation node, should I tranform all variables with LOG10 or do the standarsization ? and for contuniois vars that can be regrouped in interval (revenue for exemple), do I need to transfor it with the Bucket option ? OR I have to see every variable (after been cutted to 99% percentil to elimiante outlier) if it's skewed then apply the log10, and for the rest, apply the z-score ? OR even do not transfor any variable if I'm going to use the Minkowski distance in K-means ?
... View more