Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

Hi

I am trying to identify variables with mulitcollinearity by running a linear regression with VIF option using one of the independent variables as dependent variable. One of the variables with high VIF  has no correlation with any other variable. Wonder why its VIF is high though.

Would be great to hear your inputs on this. Thanks in advance !


Accepted Solutions
Solution
‎07-23-2014 08:56 AM
Respected Advisor
Posts: 2,655

Re: Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

I don't quite understand this approach to calculating VIF--what happens if you select a different one of the IV's as the dependent, and what if the first selected IV is highly correlated with another of the set--that would lead to a large VIF.  Why not just run the usual full set of IV's?  Can you point me to a reference for the method being proposed?

Steve Denham

View solution in original post


All Replies
Super User
Posts: 9,662

Re: Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

these variables are numeric type or character (category) type ?

Solution
‎07-23-2014 08:56 AM
Respected Advisor
Posts: 2,655

Re: Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

I don't quite understand this approach to calculating VIF--what happens if you select a different one of the IV's as the dependent, and what if the first selected IV is highly correlated with another of the set--that would lead to a large VIF.  Why not just run the usual full set of IV's?  Can you point me to a reference for the method being proposed?

Steve Denham

New Contributor
Posts: 2

Re: Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

Numeric. Yes Steve so I was running a vif on a dataset prepared for logistic regression. Was not sure if I could run proc reg vif on with binary variables as dependent. Realized i could and then ran vif with full set of IV which were numeric.

Follow up question - what about categorical variables - if vif the best diagnostic for multicollinearity or should i be looking at some other diagnostic such as spearman correlation/polychor etc?

Super User
Posts: 9,662

Re: Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

The estimate method of proc reg and proc logistic are different .   proc reg use  OLS while proc logistic use ML , therefore there is no need to check vif in proc reg for logistic Model .

BTW, you can't use binary variables as dependent variable as far as I know, the residual of REG assuming ~ N(0,1) , and logistic Model ~ binomial distribution.

Xia Keshan

Respected Advisor
Posts: 2,655

Re: Logistic Regression dataset - high vif for a variable which is not correlated with any other variable

Working from 's discussion, I found a usage note that gives a method for VIF calculation for logistic regression. Granted, it uses PROC GENMOD rather than LOGISTIC, but for what needs to be done to get the measures, I think this is what you need.  The two step process extracts Hessian weights that can be used in PROC REG for multicollinearity diagnostics.

See Usage Note 32471: Testing for equal variances, collinearity, or normality in logit, probit, poisson and other generalized linear modes.  The link is:

http://support.sas.com/kb/32/471.html

Hope this gets you moving in the right direction.

Steve Denham


☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 4673 views
  • 3 likes
  • 3 in conversation