BookmarkSubscribeRSS Feed
sasnewbie12
Obsidian | Level 7

I am trying to run a model with logistic regression containing about 20 independent variables, both categorical and continuous.

However, I am finding that the significance varies depending on which variables I include and exclude, and I believe that there is association and collinearity among the variables. 

 

As I am a new SAS user, is there any simple way to check for association among the variables in logistic regression? 

 

Thank You

9 REPLIES 9
art297
Opal | Level 21

Not my area of expertise, but the following might help: http://support.sas.com/kb/32/471.html

 

Art, CEO, AnalystFinder.com

 

pau13rown
Lapis Lazuli | Level 10

without knowing much about it, eg how many obs you have, 20 variables sounds like a lot and could be affecting things. There are some rules-of-thumb out there eg in survival analysis i think they call it 'failures per variable' (FPV) and 10 is considered sufficient. There would be something analogous for logistic regression i guess. Regarding associations among the variables, normally this would be based on an understanding of the data, ie it would be anticipated and a priori rather than data-dependent. But if you want to examine correlations among the variables then that could be done, even if the variables are of different types eg proc corr will give the the biserial correlation i think, or there's a macro for it: http://support.sas.com/kb/24/991.html

Ksharp
Super User

proc logistic is modeling by MLE , unlike proc reg by OLS.

Usually sas would do it for you automatically. Check PROC HPGENSELECT ,there are many selection method about variables,like CV , LASSO ....

sasnewbie12
Obsidian | Level 7

I have a large number of observations, 200,000 weighted, so there should be no issue with the 20 variables from that stand point. 

 

I am also just trying to find associations between the independent variables and the dependent variable, and am not interested in building a powerful model. However, when I add or remove some of the variables, it causes a few of the other variables to change significance drastically, sometimes becoming significant only after adding another variable to the model. I don't want to come up with an association that may differ from what someone else may find if they look for they same associations (for example, if they have a slightly different selection of variables and show difference in significance from what I have shown, that would make my study seem inaccurate).

 

Thank you

pau13rown
Lapis Lazuli | Level 10

in that case, the first thing i'd do (maybe you have already) is write a macro that fits the model for a single independent variable, and then run this macro for each of the 20 variables (some call these 'univariate models'), just to get a sense of things and to see which are the strongest predictors on their own. You could stop here because you are "not interested in building a powerful model". But if you want to see if any variables are superfluous you could then attempt a 'multivariate model' (a misnomer but this is how some people describe it) using only those variables that looked good in the univariate models. Although with 200,000 obs maybe every variable shows a small p-value, ie this approach is common in medical research but it really depends on what you're doing. Eg, in the methods section in this article, see the 6 steps they describe: https://www.nature.com/articles/7211492

Edit: regarding whether others can reproduce your results, as long as you layout your approach as they do in that article, then i'd say it's fine

Ksharp
Super User

" fits the model for a single independent variable, "

That is called perfect model. That is not right according to statistical theory.

I suggest to use PROC HPGENSELECT to let sas  select variables for you .

Don't use selection=stepwise/forward/backward, try CV/LASSO/LASTIC  ....,more info check doc of PROC HPGENSELECT .

sasnewbie12
Obsidian | Level 7

A followup question, say that an independant variables has significant association on the "univariate" analysis, and non-significant on "multivariate" analysis, will I be able to make any use of the adjusted odds-ratio for that variable, if the p-value is non-significant ?

 

I have seen studies where they list the adjusted odd-ratio without a p-value, so I am wondering if it holds any importance when it is non-significant?

 

Thank You

 

 

sasnewbie12
Obsidian | Level 7

Another question, if I find a categorical variable has non-significant association on multivariate analysis under "analysis of likelihood estimates", but the "Type 3 analysis of effects" shows that it is significant, what does that mean and how can it be interpreted?

 

 

PaigeMiller
Diamond | Level 26

@sasnewbie12 wrote:

I am trying to run a model with logistic regression containing about 20 independent variables, both categorical and continuous.

However, I am finding that the significance varies depending on which variables I include and exclude, and I believe that there is association and collinearity among the variables. 

 

As I am a new SAS user, is there any simple way to check for association among the variables in logistic regression? 

 

Thank You


You keep asking the same questions over and over, and my answers don't change, just because you ask again three weeks later. I repeat my answer given here: https://communities.sas.com/t5/SAS-Statistical-Procedures/multivariate-logistic-regression-variable-...

 

Variable selection is fundamentally a poor approach when you have many correlated variables. It doesn't matter if you are new to SAS or experienced in SAS or using R or Python or Minitab. It is not the software that makes it a poor approach.

 

At that link, I reference a method of performing Logistic Partial Least Squares regression, fundamentally a superior approach. There is R code to do this, but I am not aware of SAS code to do this. However, since you can run R code through SAS PROC IML, that seems to be the approach I would take.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 4495 views
  • 1 like
  • 5 in conversation