Statistical Procedures

SimoneStefano96 · Posted 12-30-2021 07:08 AM

Hello,

I am running a logistic regression model.

I used both PROC LOGISTIC and PROC GENMOD (outputs attached below).

Long story short, I know my logistic model suffers from perfect separation, therefore I fitted a penalised FIRTH logistic regression in PROC LOGISTIC to solve non-convergence issue.

However, I was wondering whether PROC GENMOD implements such similar options and why it does not provide an estimate for the p-value associated to CONDISBL (the variable suffering from perfect separation) as PROC LOGISTIC does (I would expect the standard error to be high in PROC GENMOD too). Instead, it seems like perfect separation here can not allow for interval estimation of the parameter.

Thanks,

Simone

sbxkoenk · Posted 12-30-2021 08:21 AM

Hello,

I don't think PROC GENMOD is a good option in case of complete separation (and in case PROC LOGISTIC with FIRTH option can be used as well).

However (!) : PROC GENMOD is producing Likelihood Ratio Statistics.
PROC GENMOD can report optional likelihood-ratio chi-square tests for each of the coefficients in the model. Unlike Wald chi-squares, which are essentially useless under complete or quasi-complete separation, the likelihood ratio test is still a valid test of the null hypothesis that a coefficient is equal to 0. Thus, even if a certain parameter can’t be estimated, we can still judge whether or not it is significantly different from 0.
(See : https://statisticalhorizons.com/wp-content/uploads/Allison.StatComp.pdf )

The same statement is also made here :
Usage Note 22599: Understanding and correcting complete or quasi-complete separation problems

https://support.sas.com/kb/22/599.html

Regards,

Koen

SimoneStefano96 · Posted 01-03-2022 02:56 AM

Hello @sbxkoenk ,

thanks for your reply. Sorry for late reply, I was figuring out how to tackle this issue.

Why do you state that GENMOD would not be a good option in this case? I'm just curious about it.

What I'm exactly doing is fitting PROC MI to datasets produced via simulation. For each multiply-implied dataset, my aim is to obtain the p-value associated with each parameter. Overall, by considering all datasets, I obtain a distribution of p-values, given each single parameter.

NOTE : PROC MI required augmentation, as foreseen.

NOTE: I used PROC MIANALYZE for LOGISTIC, but now for PROC GENMOD and use of LRT, the only thing I can do is combining LRT Chi-Squared values by means of the COMBCHI macro by Allison (https://www.sas.upenn.edu/~allison/combchi.sas).

My aim is to get the p-values, as I am working on Delta-Adjustment.

So, my question is the following:

Would you recommend employing LRT p-values also for the other variables not showing quasi-complete separation? As far as I understand, they are fairly similar and Wald is an approximation of LRT in some cases.

Thanks,

Simone

sbxkoenk · Posted 01-03-2022 11:45 AM

Hello @SimoneStefano96 ,

I said PROC LOGISTIC is a better option than PROC GENMOD in case of separation, because PROC LOGISTIC is giving a clear warning message in case of Complete Separation and also in case for Quasi-Complete Separation.

On the other hand, PROC GENMOD cannot detect Complete Separation (no warning at all) and is giving an ambiguous warning only in case of Quasi-Complete Separation.

For your final question :
The separation may be due to a single variable, but the whole maximum likelihood fitting of the model suffers from it. So I think you have to assess all variables in the same way (LRT chi-square instead of Wald chi-square).

But reading the answer from @StatDave , I am now doubting if it makes sense to do any inference at all.

@StatDave is certainly more knowledgeable about this topic than I am.

Thanks,

Koen

StatDave · Posted 12-30-2021 04:24 PM

Large parameter estimates (like seen in the GENMOD output) and/or large standard errors (as seen in the LOGISTIC output) are both indicators of the fitting algorithm having trouble finding a proper maximum likelihood solution - often due to separation as you note. Whenever you see such indications with any model fit by maximum likelihood, it is a good idea to check the gradient values. This can be done by adding the ITPRINT option in the MODEL statement. When a proper solution is achieved, all gradient values should be very close to zero. That will probably not be the case here. GENMOD has no check for separation in a binary response model (since that procedure models many types of responses, not just binary responses) and so the iterations continue even as some parameters are infinite and this eventually can result in various errors such as overflows or lost degrees of freedom - the latter is seen here. In any case, if a proper solution with finite estimates cannot be achieved, the results are not useful for inference purposes.

Statistical Procedures

Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Follow Us

What is...

Statistical Procedures

Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Re: Complete separation in PROC GENMOD

Our biggest data and AI event of the year.

Follow Us

What is...