Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: Logistic Regression Collinearity

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-24-2017 05:05 PM
(4556 views)

I am trying to run a model with logistic regression containing about 20 independent variables, both categorical and continuous.

However, I am finding that the significance varies depending on which variables I include and exclude, and I believe that there is association and collinearity among the variables.

As I am a new SAS user, is there any simple way to check for association among the variables in logistic regression?

Thank You

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Not my area of expertise, but the following might help: http://support.sas.com/kb/32/471.html

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

proc logistic is modeling by MLE , unlike proc reg by OLS.

Usually sas would do it for you automatically. Check PROC HPGENSELECT ,there are many selection method about variables,like CV , LASSO ....

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have a large number of observations, 200,000 weighted, so there should be no issue with the 20 variables from that stand point.

I am also just trying to find associations between the independent variables and the dependent variable, and am not interested in building a powerful model. However, when I add or remove some of the variables, it causes a few of the other variables to change significance drastically, sometimes becoming significant only after adding another variable to the model. I don't want to come up with an association that may differ from what someone else may find if they look for they same associations (for example, if they have a slightly different selection of variables and show difference in significance from what I have shown, that would make my study seem inaccurate).

Thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

in that case, the first thing i'd do (maybe you have already) is write a macro that fits the model for a single independent variable, and then run this macro for each of the 20 variables (some call these 'univariate models'), just to get a sense of things and to see which are the strongest predictors on their own. You could stop here because you are "not interested in building a powerful model". But if you want to see if any variables are superfluous you could then attempt a 'multivariate model' (a misnomer but this is how some people describe it) using only those variables that looked good in the univariate models. Although with 200,000 obs maybe every variable shows a small p-value, ie this approach is common in medical research but it really depends on what you're doing. Eg, in the methods section in this article, see the 6 steps they describe: https://www.nature.com/articles/7211492

Edit: regarding whether others can reproduce your results, as long as you layout your approach as they do in that article, then i'd say it's fine

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

" fits the model for a single independent variable, "

That is called perfect model. That is not right according to statistical theory.

I suggest to use PROC HPGENSELECT to let sas select variables for you .

Don't use selection=stepwise/forward/backward, try CV/LASSO/LASTIC ....,more info check doc of PROC HPGENSELECT .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A followup question, say that an independant variables has significant association on the "univariate" analysis, and non-significant on "multivariate" analysis, will I be able to make any use of the adjusted odds-ratio for that variable, if the p-value is non-significant ?

I have seen studies where they list the adjusted odd-ratio without a p-value, so I am wondering if it holds any importance when it is non-significant?

Thank You

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@sasnewbie12 wrote:

I am trying to run a model with logistic regression containing about 20 independent variables, both categorical and continuous.

However, I am finding that the significance varies depending on which variables I include and exclude, and I believe that there is association and collinearity among the variables.

As I am a new SAS user, is there any simple way to check for association among the variables in logistic regression?

Thank You

You keep asking the same questions over and over, and my answers don't change, just because you ask again three weeks later. I repeat my answer given here: https://communities.sas.com/t5/SAS-Statistical-Procedures/multivariate-logistic-regression-variable-...

Variable selection is fundamentally a poor approach when you have many correlated variables. It doesn't matter if you are new to SAS or experienced in SAS or using R or Python or Minitab. It is not the software that makes it a poor approach.

At that link, I reference a method of performing Logistic Partial Least Squares regression, fundamentally a superior approach. There is R code to do this, but I am not aware of SAS code to do this. However, since you can run R code through SAS PROC IML, that seems to be the approach I would take.

--

Paige Miller

Paige Miller

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.