topic Variable Reduction for Clustering Model in SAS Data Science

Variable Reduction for Clustering Model

chemicalab — Mon, 14 Apr 2014 13:35:37 GMT

Hi all,

Are there any other more advanced and robust ways in SAS Base besides Varclus or principal components that can be used for variable reduction?

I am trying to perform a cluster analysis with over a hundred variable so i was wondering if there is something out there that can help reduce the number of variables as well as providing me with the strongest discriminators for my data.

Kind regards

Re: Variable Reduction for Clustering Model

M_Maldonado — Mon, 14 Apr 2014 14:05:20 GMT

Hi Chemicalab,

Proc princomp and proc varclus are the go-to methods in Base SAS as you mention.

A different approach if you have access to SAS Enterprise Miner: try calculating the variable importance using a tree-based model node. Then confirm the variable importance of your variables.
Please note that these nodes have the variable selection option set to Yes by default. This means that if you connect any of these nodes to a Cluster node, you will pass only the most important variables (relative variable importance greater or equal to 0.05). A few considerations below.

Decision Tree node - variable importance is calculated using only one decision tree.
HPForest node - variable importance is calculated using a random forest model, which is more robust. This node is available in SAS Enterprise Miner 12.3 or newer.
Gradient Boosting node - it is very robust, but the sequential nature of this algorithm makes it take some time to run.

I hope it helps,
Thanks,

Miguel

Re: Variable Reduction for Clustering Model

chemicalab — Mon, 14 Apr 2014 14:16:37 GMT

Unfortunately i dont have EM so i guess i will have to go with Proc Princ or Varclus, thank you for the reply