BookmarkSubscribeRSS Feed
AmrAd
Obsidian | Level 7
If i have a list of 5 variable and need to derive all possible combinations for these 50 variable where no two variables are highly correlated with one another,given that i have the correlation between every two variables ( every possible pir of variables) from these 5o variables

For example i need output like highest number of combination (i.e 20-30 etc from these variables )
4 REPLIES 4
PaigeMiller
Diamond | Level 26

@AmrAd wrote:
If i have a list of 5 variable and need to derive all possible combinations for these 50 variable where no two variables are highly correlated with one another,given that i have the correlation between every two variables ( every possible pir of variables) from these 5o variables

For example i need output like highest number of combination (i.e 20-30 etc from these variables )

So, this is a set of requirements that I haven't seen before. You can do all possible regressions via this code, and then weed out the ones you don't want based upon your correlation restrictions.

 

I wonder though, if there isn't a better way to handle multicollinearity. Actually, I don't wonder, I know there are better ways to handle multicollinearity, which also involve much less programming. The two that come to mind are PROC GLMSELECT and PROC PLS, both of which give you the ability to fit models (and compare them) in the presence of multicollienarity. If I were you, I would start there and not even bother with the method you stated.

--
Paige Miller
AmrAd
Obsidian | Level 7
I am running proc logistic on these variables and the model selects up to 20-30 variables from this list. However, Variance inflation factor breaches the acceptable threshold and shows multi-collinearity in up to 10 of them. So i am looking for a code that could help me in testing out all possible combinations of these 10 with respect to the non-collinear remaining 20 or so.
PaigeMiller
Diamond | Level 26

You can simply modify the code I linked to so that instead of PROC REG, you type in PROC LOGISTIC and the desired options.

 

You can also find out there on the internet SAS code for stepwise PROC GLIMMIX, which would include logistic regression as a special case.

 

I still think the best method of handling multicollinearity is not what you are trying to do, but Logistic Partial Least Squares.  Unfortunately, this is not a feature of SAS, but it has been programmed in R. There is no logistic counterpart to PROC GLMSELECT.

--
Paige Miller
AmrAd
Obsidian | Level 7
Thanks a lot will check this

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 856 views
  • 1 like
  • 2 in conversation