I have a couple of questions on Cluster Analysis (chapter 8 of course notes):
1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? Or would it just be better to use interval variables as suggested by the course notes at page 8-9? ("An interval measurement level is recommended for k-means to produce non-trivial clusters")
My Answers:
For K-means and Hierarchical clustering interval variables are recommended. SAS HP cluster node also can perform ABC clustering based on Manhattan distance. For this option you can also include dummy variables from a categorical var.
2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?
My answer:
For K-mean clustering and PCA , Z-standardization is preferred. For some special NN machine learning algorithm Range-normalization may be preferred.