BookmarkSubscribeRSS Feed
varunnakra
Fluorite | Level 6

This question is more about methodology than proc varclus.

"I have a question that if we are using proc varclus to eliminate redundancy in the IV's, how do we go about selecting the cluster representatives? I know the lower the (1-R^2) ratio, the better is a variable as a representative, however, if we use other factors such as business sense or univariate chi square of a variable along with (1-R^2) ratio then should we select cluster representatives that have a higher univariate chi square or making more 'business sense' even if they are having a higher (1-R^2) ratio?..Or else, we should go by selecting the top 5 , top 10 variables per cluster and then look at other statistics later on? Please advise..!"



4 REPLIES 4
SteveDenham
Jade | Level 19

Business sense should trump all other considerations, in my opinion.  If you can't explain to your audience what is going on, it is difficult to make a good case for the selection.  Note that this is the case only if you are in the situation where you have several variables that are "close" in explanatory power.

And again, it is only opinion.

Steve Denham

varunnakra
Fluorite | Level 6

Thanks Steve,

Let's say I am defining the predictive power of a variable through its univariate chi square statistic. I think that there will be variables with higher (1-R^2) ratio and higher predictive power which will be left out if we just concentrate on lower (1-R^2) ratio only to eliminate redundancy. Or, since the variables have been grouped into clustering, all the variables inside one cluster will be having almost similar predictive power?

SteveDenham
Jade | Level 19

Still don't know if this is helpful, but depending on the measure used to cluster, wouldn't those variables within a cluster really all be measuring about the same thing/have simialr patterns/be explicit representations of some underlying variable?  (As you might guess, I haven't used varclus in a LONG time.)  So given that, I would pick the variable that makes the most business sense, even if it is slightly lower in predictive power.  Now if there is a fair difference in the statistics (and I don't have an objective measure of "fair difference"), you really should pick the "better" predictor and see if it makes sense from a business sense point.

I am starting to think that PROC PLS might be of some utility here.

Steve Denham

varunnakra
Fluorite | Level 6

Thanks Steve,

Proc varclus could be of use in removing the 'redundancy' and not 'irrelevancy'..What I did was, next time I requested a large number of clusters (50 and then 100). That ensured lesser number of variables in each cluster for me to analyze and shortlist. Then, one can think of picking low (1-R^2) and business sense variables.

I will surely look at proc PLS too..

Thanks again..

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1822 views
  • 1 like
  • 2 in conversation