BookmarkSubscribeRSS Feed
lionking19063
Fluorite | Level 6

Hi,

I am constructing a binary logistic regression model using 'proc logistic' in SAS, and the output appears to be error-free with no warning messages. However, when I replicate the same dataset and variables in Python using statsmodels, I encounter an error message indicating possible quasi-complete separation: 'A fraction of 0.20 of observations can be perfectly predicted. This might indicate quasi-separation, and in such cases, some parameters may not be identified.' In light of this discrepancy, I am uncertain about which set of results to trust—those from SAS or from Python. Thank you for any guidance you can provide.

2 REPLIES 2
ballardw
Super User

Data and the SAS code would be needed to tell about the SAS side of things.

 

As a minimum the Python code would be needed as well.

 

Any two programs are likely to have one or more defaults or options that affect such interpretation and without sufficient details we would be guessing.

 

Is that "Possible quasi-complete separation" actually an error message? That sounds  more like a WARNING, which means you need to investigate your data for accuracy and your code for correct use of that data. I would expect an ERROR to yield no output.

Reeza
Super User

How do the parameters compare? Not sure if this is the case, but some packages in python standardize the data before doing the regression, SAS does not. Also, what parameterization method is being used if categorical variables are involved.

 

There's likely a reason this is occurring but there isn't enough information to say why. 

Just running 'logistic regression' in each application is not necessarily running equivalent models. 

 


@lionking19063 wrote:

Hi,

I am constructing a binary logistic regression model using 'proc logistic' in SAS, and the output appears to be error-free with no warning messages. However, when I replicate the same dataset and variables in Python using statsmodels, I encounter an error message indicating possible quasi-complete separation: 'A fraction of 0.20 of observations can be perfectly predicted. This might indicate quasi-separation, and in such cases, some parameters may not be identified.' In light of this discrepancy, I am uncertain about which set of results to trust—those from SAS or from Python. Thank you for any guidance you can provide.


 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 255 views
  • 1 like
  • 3 in conversation