About Data_Guy

Data_Guy · ‎02-02-2018

Hi Experts, After creating 5 cluster groups (using k-Means Algorithm) from my data set based on 4 continuous variables, I was wondering if it is valid to use the cluster group ids (1 to 5) as an dependent variable in a multinomial logistic regression (using the same 4 continuous variables in the clustering algorithm as independent variables) to predict the cluster groups of new observations (with the same 4 continuous variables)? Note that data for 3 of my 4 independent variables are highly skewed. If the above method is valid, not sure which other types of classifiers (i.e. KNN, Decision Trees, SVMs, etc.) would be best to predict cluster group for new observations. Thanks much!

Data_Guy · ‎04-28-2011

Hi, I have a couple of questions on clustering (using Cluster Node) using SAS Enterprise Miner and am hoping that someone can help. I have a data set of almost 10,000 customers containing their age, tenure with the company, whether they are a High Net Worth customers (1 or 0), and a ranking of their product holdings (1 to 4, with 1 being the highest ranking). Below are my questions: 1. For clustering on only continuous variables --> age and tenure, I am not sure which of the 3 Clustering Method options (Centroid, Average, and Ward) in the Cluster Node is best. I get different number of clusters for each of the 3 methods (Centroid, Average, and Ward). I get 5 optimal clusters using the Centroid method, 4 using the Average method, and 3 using the Ward method. Values for each have been standardized and transformed (to eliminate right skew of data). Also, the option of CCC cutoff = 3 was used (by default). 2. For clustering on continuous and discrete variables --> age, tenure, high value status (0 or 1), and product ranking (1 to 4), customers with high value status are bucketed into their own cluster. Does it make sense to use binary/categorical variables in clustering? In the Encoding of Class Variables option in the Cluster Node, I chose Ordinal Encoding = Rank and Nominal Encoding = GLM. 3. Not sure if there is a way to output the results of the clustering as a SAS data set other than copying and pasting the results (from Exported Data under the Train section in the Properties bar of the Cluster Node) into an Excel spreadsheet. Thank you.

Online Status	Offline
Date Last Visited	‎02-02-2018 08:35 PM

How to Predict Cluster Group of New Observations

Clustering in SAS Enterprise Miner

How to Predict Cluster Group of New Observations

Clustering in SAS Enterprise Miner