DATA Step, Macro, Functions and more

Logistic Regression Collinearity

Reply
Contributor
Posts: 35

Logistic Regression Collinearity

I am trying to run a model with logistic regression containing about 20 independent variables, both categorical and continuous.

However, I am finding that the significance varies depending on which variables I include and exclude, and I believe that there is association and collinearity among the variables. 

 

As I am a new SAS user, is there any simple way to check for association among the variables in logistic regression? 

 

Thank You

PROC Star
Posts: 8,145

Re: Logistic Regression Collinearity

Posted in reply to sasnewbie12

Not my area of expertise, but the following might help: http://support.sas.com/kb/32/471.html

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 80

Re: Logistic Regression Collinearity

Posted in reply to sasnewbie12

without knowing much about it, eg how many obs you have, 20 variables sounds like a lot and could be affecting things. There are some rules-of-thumb out there eg in survival analysis i think they call it 'failures per variable' (FPV) and 10 is considered sufficient. There would be something analogous for logistic regression i guess. Regarding associations among the variables, normally this would be based on an understanding of the data, ie it would be anticipated and a priori rather than data-dependent. But if you want to examine correlations among the variables then that could be done, even if the variables are of different types eg proc corr will give the the biserial correlation i think, or there's a macro for it: http://support.sas.com/kb/24/991.html

--------------
blog: papersandprograms.com
Super User
Posts: 10,686

Re: Logistic Regression Collinearity

Posted in reply to sasnewbie12

proc logistic is modeling by MLE , unlike proc reg by OLS.

Usually sas would do it for you automatically. Check PROC HPGENSELECT ,there are many selection method about variables,like CV , LASSO ....

Contributor
Posts: 35

Re: Logistic Regression Collinearity

Posted in reply to sasnewbie12

I have a large number of observations, 200,000 weighted, so there should be no issue with the 20 variables from that stand point. 

 

I am also just trying to find associations between the independent variables and the dependent variable, and am not interested in building a powerful model. However, when I add or remove some of the variables, it causes a few of the other variables to change significance drastically, sometimes becoming significant only after adding another variable to the model. I don't want to come up with an association that may differ from what someone else may find if they look for they same associations (for example, if they have a slightly different selection of variables and show difference in significance from what I have shown, that would make my study seem inaccurate).

 

Thank you

Frequent Contributor
Posts: 80

Re: Logistic Regression Collinearity

[ Edited ]
Posted in reply to sasnewbie12

in that case, the first thing i'd do (maybe you have already) is write a macro that fits the model for a single independent variable, and then run this macro for each of the 20 variables (some call these 'univariate models'), just to get a sense of things and to see which are the strongest predictors on their own. You could stop here because you are "not interested in building a powerful model". But if you want to see if any variables are superfluous you could then attempt a 'multivariate model' (a misnomer but this is how some people describe it) using only those variables that looked good in the univariate models. Although with 200,000 obs maybe every variable shows a small p-value, ie this approach is common in medical research but it really depends on what you're doing. Eg, in the methods section in this article, see the 6 steps they describe: https://www.nature.com/articles/7211492

Edit: regarding whether others can reproduce your results, as long as you layout your approach as they do in that article, then i'd say it's fine

--------------
blog: papersandprograms.com
Super User
Posts: 10,686

Re: Logistic Regression Collinearity

Posted in reply to PaulBrownPhD

" fits the model for a single independent variable, "

That is called perfect model. That is not right according to statistical theory.

I suggest to use PROC HPGENSELECT to let sas  select variables for you .

Don't use selection=stepwise/forward/backward, try CV/LASSO/LASTIC  ....,more info check doc of PROC HPGENSELECT .

Contributor
Posts: 35

Re: Logistic Regression Collinearity

Posted in reply to sasnewbie12

A followup question, say that an independant variables has significant association on the "univariate" analysis, and non-significant on "multivariate" analysis, will I be able to make any use of the adjusted odds-ratio for that variable, if the p-value is non-significant ?

 

I have seen studies where they list the adjusted odd-ratio without a p-value, so I am wondering if it holds any importance when it is non-significant?

 

Thank You

 

 

Contributor
Posts: 35

Re: Logistic Regression Collinearity

Posted in reply to sasnewbie12

Another question, if I find a categorical variable has non-significant association on multivariate analysis under "analysis of likelihood estimates", but the "Type 3 analysis of effects" shows that it is significant, what does that mean and how can it be interpreted?

 

 

Respected Advisor
Posts: 2,807

Re: Logistic Regression Collinearity

[ Edited ]
Posted in reply to sasnewbie12

sasnewbie12 wrote:

I am trying to run a model with logistic regression containing about 20 independent variables, both categorical and continuous.

However, I am finding that the significance varies depending on which variables I include and exclude, and I believe that there is association and collinearity among the variables. 

 

As I am a new SAS user, is there any simple way to check for association among the variables in logistic regression? 

 

Thank You


You keep asking the same questions over and over, and my answers don't change, just because you ask again three weeks later. I repeat my answer given here: https://communities.sas.com/t5/SAS-Statistical-Procedures/multivariate-logistic-regression-variable-...

 

Variable selection is fundamentally a poor approach when you have many correlated variables. It doesn't matter if you are new to SAS or experienced in SAS or using R or Python or Minitab. It is not the software that makes it a poor approach.

 

At that link, I reference a method of performing Logistic Partial Least Squares regression, fundamentally a superior approach. There is R code to do this, but I am not aware of SAS code to do this. However, since you can run R code through SAS PROC IML, that seems to be the approach I would take.

--
Paige Miller
Ask a Question
Discussion stats
  • 9 replies
  • 421 views
  • 1 like
  • 5 in conversation