Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Variable Reduction for Clustering Model

Frequent Contributor
Posts: 126

Variable Reduction for Clustering Model

Hi all,

Are there any other more advanced and robust ways in SAS Base besides Varclus or principal components that can be used for variable reduction?

I am trying to perform a cluster analysis with over a hundred variable so i was wondering if there is something out there that can help reduce the number of variables as well as providing me with the strongest discriminators for my data.

Kind regards

Super Contributor
Posts: 336

Re: Variable Reduction for Clustering Model

Hi Chemicalab,

Proc princomp and proc varclus are the go-to methods in Base SAS as you mention.

A different approach if you have access to SAS Enterprise Miner: try calculating the variable importance using a tree-based model node. Then confirm the variable importance of your variables.
Please note that these nodes have the variable selection option set to Yes by default. This means that if you connect any of these nodes to a Cluster node, you will pass only the most important variables (relative variable importance greater or equal to 0.05). A few considerations below.

  • Decision Tree node - variable importance is calculated using only one decision tree.
  • HPForest node - variable importance is calculated using a random forest model, which is more robust. This node is available in SAS Enterprise Miner 12.3 or newer.
  • Gradient Boosting node - it is very robust, but the sequential nature of this algorithm makes it take some time to run.

I hope it helps,


Frequent Contributor
Posts: 126

Re: Variable Reduction for Clustering Model

Unfortunately i dont have EM so i guess i will have to go with Proc Princ or Varclus, thank you for the reply

Ask a Question
Discussion stats
  • 2 replies
  • 2 in conversation