BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

Hi,

 

I am using proc logistic and I have high correlation between two input vars: var1 and var2.

These vars are continuos. I have the effect of colinearity in my model.

I want to know if using ranges can be a solution. I convert var1 and var2 to discrete vars using ranges,

then in the model I use the vars as a classification vars.

 

I would like to know if this is a good solution to reduce colineatity in my model.

 

Any advice or other solution will be greatly appreciated.

 

Thanks

1 REPLY 1
PaigeMiller
Diamond | Level 26

Using ranges of continuous variables is rarely a good solution to any problem, in my opinion. And I don't see how using ranges eliminates the problem of multicollinearity, its still there, you are just masking it by creating ranges, and creating other problems by creating ranges.

 

The problem of collinearity between predictor variables is not one that can be "solved", in the sense that collinearity exists, and you will not be able to understand or analyze the data as if the collinearity does not exist. All algorithms that you might try will be affected by this collinearity.

 

In ordinary least squares regression, the collinearity causes the estimates of slopes and interecpt to have much higher root mean square errors, so high in fact that term in the model could have the wrong sign. I haven't seen a study about what happens when you have collinearity in logistic regression, but I would expect similar things will happen in the presence of collinearity in logistic regression. Thus, the question really is NOT how to eliminate or reduce the multicollinearity, but what methods produce the lowest root mean square errors of slopes and intercept. According to a paper by Frank and Friedman, the algorithm that produces the lowest root mean square errors (in most cases) is called Partial Least Squares and so that would your best choice in the case of collinearity (which is PROC PLS in SAS). There is a logistic version of Partial Least Squares, here: https://cedric.cnam.fr/fichiers/RC906.pdf

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1167 views
  • 0 likes
  • 2 in conversation