07-23-2017 04:22 AM
I have an ordinal independent variable and ordinal response variable. I used PRoc logistic and checked score test for proportional odds. It did not hold true. Therefore I resorted to Generalized logit. But it gives me the following warning:
The validity of the model fit is questionable.
And the log says the following:
There is possibly a quasi-complete separation of data points. The maximum likelihood
estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
on the last maximum likelihood iteration. The validity of the model fit is questionable.
What would be an appropriate way to get an outcome? Or is there any way by which i can eliminate the above errors?
07-23-2017 08:33 AM
For background info on (quasi-)complete separation, see:
Usage Note 22599: Understanding and correcting complete or quasi-complete separation problems
But even when you have a separation condition, the resulting model can be quite good at classifying observations. Check this on a holdout dataset! Holdout dataset = independent observations with known outcome but never seen by the model while training it.
However when you have a separation condition, the resulting model cannot be interpreted. Inference about regression coefficients and odds ratios should be avoided, because maximum likelihood estimates for the model parameters do not exist. You simply treat the model as if it is produced by an uninterpretable machine learning algorithm (like neural nets).
What can you do to avoid the separation condition?
Collapsing levels of categorical variables and binning interval variables are commonly used techniques to deal with separation condition.