Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Variable reduction in an unsupervised dataset

Occasional Contributor
Posts: 13

Variable reduction in an unsupervised dataset

Hello Everyone,

    I have a dataset which has about 1000 variables (all are numerical) and is unsupervised(has no target variable). It has a column "zipcode" and my goal is to form meaningful clusters based on this dataset to analyze the association between the zip codes . I was looking to reduce the number of variables (dimensionality reduction) so that I can pass the reduced dataset to PROC Varclus . Is there any effective Procedure for dimensionality reduction for unsupervised datasets? I am using Enterprise Miner and Enterprise Guide. Any related  response would be of great help.  Thankyou!

Super Contributor
Posts: 336

Re: Variable reduction in an unsupervised dataset

Hey Minal,

1000 inputs do not seem like a lot, so I think you are good to use the Cluster or HPCluster nodes just on those inputs. I am not very clear on what are you planning to do with the zip codes. Were you planning to run a cluster node on your 1000 inputs and then compare those clusters to your zip codes? Or what was your plan?

You can use the Variable Cluster and the Principal Component nodes in Enterprise Miner for dimension reduction but I am not sure that you need that.

Good luck!


Occasional Contributor
Posts: 13

Re: Variable reduction in an unsupervised dataset

Hi Miguel,

    Thank you for your reply. Yes, I was planning to run the cluster node on 1000 inputs and then compare/map the observations with the respective zip codes. FYI, each observation is identified by a unique zip code. This is the only method that I could guess. Is there any other efficient method or procedure for dimensionality reduction in an unsupervised dataset other than using the Cluster node?



Ask a Question
Discussion stats
  • 2 replies
  • 2 in conversation