Hi,
I have a couple of questions on clustering (using Cluster Node) using SAS Enterprise Miner and am hoping that someone can help.
I have a data set of almost 10,000 customers containing their age, tenure with the company, whether they are a High Net Worth customers (1 or 0), and a ranking of their product holdings (1 to 4, with 1 being the highest ranking).
Below are my questions:
1. For clustering on only continuous variables --> age and tenure, I am not sure which of the 3 Clustering Method options (Centroid, Average, and Ward) in the Cluster Node is best. I get different number of clusters for each of the 3 methods (Centroid, Average, and Ward). I get 5 optimal clusters using the Centroid method, 4 using the Average method, and 3 using the Ward method.
Values for each have been standardized and transformed (to eliminate right skew of data). Also, the option of CCC cutoff = 3 was used (by default).
2. For clustering on continuous and discrete variables --> age, tenure, high value status (0 or 1), and product ranking (1 to 4), customers with high value status are bucketed into their own cluster. Does it make sense to use binary/categorical variables in clustering?
In the Encoding of Class Variables option in the Cluster Node, I chose Ordinal Encoding = Rank and Nominal Encoding = GLM.
3. Not sure if there is a way to output the results of the clustering as a SAS data set other than copying and pasting the results (from Exported Data under the Train section in the Properties bar of the Cluster Node) into an Excel spreadsheet.
Thank you.
... View more