I am trying to predict a binary outcome using logistic regression, but I keep getting this warning 'There is a complete separation of data points. The maximum likelihood does not exist'. I tried the firth and exact statement to solve this issue, but still the same. Is there another way to solve this issue? Thanks
Maybe a demonstration:
data example; input result demo $; datalines; 1 M 1 M 1 M 0 F 0 F 0 F ; proc logistic data=example; class demo; model result = demo; run;
shows this for the Log:
NOTE: PROC LOGISTIC is modeling the probability that result=0. One way to change this to model the probability that result=1 is to specify the response variable option EVENT='1'. WARNING: There is a complete separation of data points. The maximum likelihood estimate does not exist. WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. NOTE: There were 6 observations read from the data set WORK.EXAMPLE. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.03 seconds cpu time 0.03 seconds
In this case it is simple to see that ALL results of each value come from only one value of the independent variable. If all of some combinations of the independent values only yield one result and others the different result you have "separation"
Fix it?
It may mean reducing the number of independent variables or collecting more data.
Hi @Mariloud
Your statement is quite ambigous, I am not able to understand what you want to say exactly. Apart from that, your post is related to PROC STAT so you better consider posting to this place next time
https://communities.sas.com/t5/Statistical-Procedures/bd-p/statistical_procedures
Maybe a demonstration:
data example; input result demo $; datalines; 1 M 1 M 1 M 0 F 0 F 0 F ; proc logistic data=example; class demo; model result = demo; run;
shows this for the Log:
NOTE: PROC LOGISTIC is modeling the probability that result=0. One way to change this to model the probability that result=1 is to specify the response variable option EVENT='1'. WARNING: There is a complete separation of data points. The maximum likelihood estimate does not exist. WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. NOTE: There were 6 observations read from the data set WORK.EXAMPLE. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.03 seconds cpu time 0.03 seconds
In this case it is simple to see that ALL results of each value come from only one value of the independent variable. If all of some combinations of the independent values only yield one result and others the different result you have "separation"
Fix it?
It may mean reducing the number of independent variables or collecting more data.
Very helpful, thank you!
This means that there is a single (or combination of variables) that uniquely identify your outcome.
You can figure this out by using PROC FREQ against your variables to see which offer a complete separation.
This also happens if you accidentally include the outcome or a similar variable in your model.
proc freq data=have;
table outcome * (variables in model);
run;
@Mariloud wrote:
I am trying to predict a binary outcome using logistic regression, but I keep getting this warning 'There is a complete separation of data points. The maximum likelihood does not exist'. I tried the firth and exact statement to solve this issue, but still the same. Is there another way to solve this issue? Thanks
I see, thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.