turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Logistic Regression with unbalanced explanatory va...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

a week ago

HI!

I have performed a logistic regression with dichotomous dependent variable, 2 continuous independent variable, 14 dichotomous variables and 1 multi-level variable.

All independent variables were significant in univariate logistic regression, except one. I have a problem with a specific variable (in theunivariate and so in the multivariate logistic regression), I obtain this result ODDS: Estimate >999.99 and the 95% Confidence Limits >999.999 - > 999.999.

however, If I calculate OR by using proc freq using the dependent variable * independent variable, I have the following result estimate point: 1212.8991 and 95% Confidence Limits 1031.0208 1426.8618.

It’s a huge problem because it is an important variable for excluding it, and I know that the problem is the unbalanced data (in one cell I have 147 observations out of 164.000). What can I do?

I have tried also with Firth penalization without success, and to use exact analysis in proc logistic without success. I was considering also the proc glimmix but maybe I have not found the correct options to include…

What can I do for using that variable in my model? Which approach can I try??

Please help me, thank you

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

a week ago

For what its worth, when you have many input variables, and they are correlated with one another, it is my opinion (and also the opinion of many others) that you cannot really determine which variables are important, and you cannot determine the exact amount of their importance independent of other variables -- which seems to be what you are trying to do.

The best you can do in this situation is to find a model that fits the data well and gives you good predictions. This is possible in this situation, and maybe the model you have fits well enough.

Also, the idea of comparing a univariate regression or PROC FREQ analysis to the results of your multiple input variable regression seems a bit strained, as they don't have to match.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Friday

Thank you very much for your answer.

In respect to the last part, I did not give a correct explanation, because I did not compare univariate results with those of the multiple input variable regression:

I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.

I have used the proc freq just to see if the OR calculated in a different way was the same.

In respect to the last part, I did not give a correct explanation, because I did not compare univariate results with those of the multiple input variable regression:

I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.

I have used the proc freq just to see if the OR calculated in a different way was the same.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Friday

I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.

I'm not sure that's a valid approach.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Friday

I remembered @Rick_SAS has answered this question before .