topic Re: Logistic regression question in Statistical Procedures

Logistic regression question

wcw2 — Fri, 14 Apr 2023 11:56:37 GMT

I'm running a model in Proc Logistic, modeling the probability of a negative culture (Y/N) with the dichotomous predictors drug (Y/N) and disease severity (Y/N). I also need to include study site (34 of these and many are sparsely populated) as it's a confounder. However, when I do, the model falls apart ("Quasi-complete separation of data points detected...WARNING: The maximum likelihood estimate may not exist....WARNING: The validity of the model fit is questionable."), I guess because there are so many sites. How do I approach this problem? Should I group the sites into several chunks? I don't often run multivariate models. Thank you.

Re: Logistic regression question

PaigeMiller — Fri, 14 Apr 2023 12:06:20 GMT

Generally very sparse predictor variables are indeed a problem. You could group the sites, if there is a meaningful way to do such a grouping. Or you could try to find some continuous variable that might represent the sites.

Re: Logistic regression question

wcw2 — Fri, 14 Apr 2023 12:16:43 GMT

OK, thanks. Yes, my plan is to just group them. Most of the population is African sites, so will try Africa/non-Africa groups.

Re: Logistic regression question

StatDave — Fri, 14 Apr 2023 13:32:57 GMT

You could fit a conditional logistic model by stratifying on the sites by using the STRATA statement. Doing this will remove the need to estimate the separate parameters for the sites. See the conditional logistic example in the PROC LOGISTIC documentation. If you need to estimate the site parameters, you could try using the penalized likelihood method by adding the FIRTH option. Another possibility is exact estimation, but this is very resource intensive and might not be feasible.