BookmarkSubscribeRSS Feed
MinalMMurkhande
Calcite | Level 5


Hello Everyone,

    I have a dataset which has about 1000 variables (all are numerical) and is unsupervised(has no target variable). It has a column "zipcode" and my goal is to form meaningful clusters based on this dataset to analyze the association between the zip codes . I was looking to reduce the number of variables (dimensionality reduction) so that I can pass the reduced dataset to PROC Varclus . Is there any effective Procedure for dimensionality reduction for unsupervised datasets? I am using Enterprise Miner and Enterprise Guide. Any related  response would be of great help.  Thankyou!

2 REPLIES 2
M_Maldonado
Barite | Level 11

Hey Minal,

1000 inputs do not seem like a lot, so I think you are good to use the Cluster or HPCluster nodes just on those inputs. I am not very clear on what are you planning to do with the zip codes. Were you planning to run a cluster node on your 1000 inputs and then compare those clusters to your zip codes? Or what was your plan?

You can use the Variable Cluster and the Principal Component nodes in Enterprise Miner for dimension reduction but I am not sure that you need that.

Good luck!

-Miguel

MinalMMurkhande
Calcite | Level 5

Hi Miguel,

    Thank you for your reply. Yes, I was planning to run the cluster node on 1000 inputs and then compare/map the observations with the respective zip codes. FYI, each observation is identified by a unique zip code. This is the only method that I could guess. Is there any other efficient method or procedure for dimensionality reduction in an unsupervised dataset other than using the Cluster node?

-Regards,

Minal.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 978 views
  • 0 likes
  • 2 in conversation