## Logistic Regression with unbalanced explanatory variable

Occasional Contributor
Posts: 5

# Logistic Regression with unbalanced explanatory variable

HI!

I have performed a logistic regression with dichotomous dependent variable, 2 continuous independent variable, 14 dichotomous variables and 1 multi-level variable.

All independent variables were significant in univariate logistic regression, except one.  I have a problem with a specific variable (in theunivariate and  so in the multivariate logistic regression), I obtain this result ODDS: Estimate >999.99 and the 95% Confidence Limits >999.999 - > 999.999.

however, If I calculate OR by using proc freq using the dependent variable * independent variable, I have the following result estimate point: 1212.8991 and 95% Confidence Limits 1031.0208 1426.8618.

It’s a huge problem because it is an important variable for excluding it, and I know that the problem is the unbalanced data (in one cell I have 147 observations out of 164.000). What can I do?

I have tried also with Firth penalization without success, and to use exact analysis in proc logistic without success. I was considering also the proc glimmix but maybe I have not found the correct options to include…

What can I do for using that variable in my model? Which approach can I try??

Posts: 2,046

## Re: Logistic Regression with unbalanced explanatory variable

For what its worth, when you have many input variables, and they are correlated with one another, it is my opinion (and also the opinion of many others) that you cannot really determine which variables are important, and you cannot determine the exact amount of their importance independent of other variables -- which seems to be what you are trying to do.

The best you can do in this situation is to find a model that fits the data well and gives you good predictions. This is possible in this situation, and maybe the model you have fits well enough.

Also, the idea of comparing a univariate regression or PROC FREQ analysis to the results of your multiple input variable regression seems a bit strained, as they don't have to match.

--
Paige Miller
Occasional Contributor
Posts: 5

## Re: Logistic Regression with unbalanced explanatory variable

Posted in reply to PaigeMiller
Thank you very much for your answer.
In respect to the last part, I did not give a correct explanation, because I did not compare univariate results with those of the multiple input variable regression:
I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.
I have used the proc freq just to see if the OR calculated in a different way was the same.
Posts: 2,046

## Re: Logistic Regression with unbalanced explanatory variable

I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.

I'm not sure that's a valid approach.

--
Paige Miller
Super User
Posts: 10,200

## Re: Logistic Regression with unbalanced explanatory variable

I remembered @Rick_SAS has answered this question before .

Discussion stats
• 4 replies
• 283 views
• 2 likes
• 3 in conversation