BookmarkSubscribeRSS Feed
kgb
Calcite | Level 5 kgb
Calcite | Level 5

HI!

 I have performed a logistic regression with dichotomous dependent variable, 2 continuous independent variable, 14 dichotomous variables and 1 multi-level variable. 

All independent variables were significant in univariate logistic regression, except one.  I have a problem with a specific variable (in theunivariate and  so in the multivariate logistic regression), I obtain this result ODDS: Estimate >999.99 and the 95% Confidence Limits >999.999 - > 999.999.

however, If I calculate OR by using proc freq using the dependent variable * independent variable, I have the following result estimate point: 1212.8991 and 95% Confidence Limits 1031.0208 1426.8618.

It’s a huge problem because it is an important variable for excluding it, and I know that the problem is the unbalanced data (in one cell I have 147 observations out of 164.000). What can I do?

I have tried also with Firth penalization without success, and to use exact analysis in proc logistic without success. I was considering also the proc glimmix but maybe I have not found the correct options to include…

What can I do for using that variable in my model? Which approach can I try??

Please help me, thank you

4 REPLIES 4
PaigeMiller
Diamond | Level 26

For what its worth, when you have many input variables, and they are correlated with one another, it is my opinion (and also the opinion of many others) that you cannot really determine which variables are important, and you cannot determine the exact amount of their importance independent of other variables -- which seems to be what you are trying to do.

 

The best you can do in this situation is to find a model that fits the data well and gives you good predictions. This is possible in this situation, and maybe the model you have fits well enough.

 

Also, the idea of comparing a univariate regression or PROC FREQ analysis to the results of your multiple input variable regression seems a bit strained, as they don't have to match.

--
Paige Miller
kgb
Calcite | Level 5 kgb
Calcite | Level 5
Thank you very much for your answer.
In respect to the last part, I did not give a correct explanation, because I did not compare univariate results with those of the multiple input variable regression:
I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.
I have used the proc freq just to see if the OR calculated in a different way was the same.
PaigeMiller
Diamond | Level 26

I have tested a set of variables with univariate logistic regression for selecting the variables to test in the complete model.

I'm not sure that's a valid approach.

--
Paige Miller
Ksharp
Super User

I remembered @Rick_SAS has answered this question before .

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3170 views
  • 2 likes
  • 3 in conversation