BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pvareschi
Quartz | Level 8

Re: Applied Analytics Using SAS Enterprise Miner

I have a couple of questions on Cluster Analysis (chapter 8 of course notes):

1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? Or would it just be better to use interval variables as suggested by the course notes at page 8-9? ("An interval measurement level is recommended for k-means to produce non-trivial clusters")
2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?

 

1 ACCEPTED SOLUTION

Accepted Solutions
gcjfernandez
SAS Employee

I have a couple of questions on Cluster Analysis (chapter 8 of course notes):

1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? Or would it just be better to use interval variables as suggested by the course notes at page 8-9? ("An interval measurement level is recommended for k-means to produce non-trivial clusters")

My Answers:

For K-means and Hierarchical clustering  interval variables are recommended. SAS HP cluster node also can perform ABC clustering based on Manhattan distance. For this option you can also include dummy variables from a categorical var.
2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?

My answer:

For K-mean clustering and PCA , Z-standardization is preferred. For some special NN machine learning algorithm Range-normalization may be preferred.

View solution in original post

1 REPLY 1
gcjfernandez
SAS Employee

I have a couple of questions on Cluster Analysis (chapter 8 of course notes):

1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? Or would it just be better to use interval variables as suggested by the course notes at page 8-9? ("An interval measurement level is recommended for k-means to produce non-trivial clusters")

My Answers:

For K-means and Hierarchical clustering  interval variables are recommended. SAS HP cluster node also can perform ABC clustering based on Manhattan distance. For this option you can also include dummy variables from a categorical var.
2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?

My answer:

For K-mean clustering and PCA , Z-standardization is preferred. For some special NN machine learning algorithm Range-normalization may be preferred.

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 566 views
  • 0 likes
  • 2 in conversation