BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mariloud
Obsidian | Level 7

I am trying to predict a binary outcome using logistic regression, but I keep getting this warning 'There is a complete separation of data points. The maximum likelihood does not exist'. I tried the firth and exact statement to solve this issue, but still the same. Is there another way to solve this issue? Thanks

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Maybe a demonstration:

data example;
  input result demo $;
datalines;
1 M
1 M
1 M
0 F
0 F
0 F
;

proc logistic data=example;
   class demo;
   model result = demo;
run;

shows this for the Log:

NOTE: PROC LOGISTIC is modeling the probability that result=0. One way to change this to model
      the probability that result=1 is to specify the response variable option EVENT='1'.
WARNING: There is a complete separation of data points. The maximum likelihood estimate does not
         exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
         on the last maximum likelihood iteration. Validity of the model fit is questionable.
NOTE: There were 6 observations read from the data set WORK.EXAMPLE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

In this case it is simple to see that ALL results of each value come from only one value of the independent variable. If all of some combinations of the independent values only yield one result and others the different result you have "separation"

Fix it?

It may mean reducing the number of independent variables or collecting more data.

View solution in original post

5 REPLIES 5
Phil_NZ
Barite | Level 11

Hi @Mariloud 

Your statement is quite ambigous, I am not able to understand what you want to say exactly. Apart from that, your post is related to PROC STAT so you better consider posting to this place next time

https://communities.sas.com/t5/Statistical-Procedures/bd-p/statistical_procedures

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
ballardw
Super User

Maybe a demonstration:

data example;
  input result demo $;
datalines;
1 M
1 M
1 M
0 F
0 F
0 F
;

proc logistic data=example;
   class demo;
   model result = demo;
run;

shows this for the Log:

NOTE: PROC LOGISTIC is modeling the probability that result=0. One way to change this to model
      the probability that result=1 is to specify the response variable option EVENT='1'.
WARNING: There is a complete separation of data points. The maximum likelihood estimate does not
         exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
         on the last maximum likelihood iteration. Validity of the model fit is questionable.
NOTE: There were 6 observations read from the data set WORK.EXAMPLE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

In this case it is simple to see that ALL results of each value come from only one value of the independent variable. If all of some combinations of the independent values only yield one result and others the different result you have "separation"

Fix it?

It may mean reducing the number of independent variables or collecting more data.

Mariloud
Obsidian | Level 7

Very helpful, thank you!

Reeza
Super User

This means that there is a single (or combination of variables) that uniquely identify your outcome. 

You can figure this out by using PROC FREQ against your variables to see which offer a complete separation.

This also happens if you accidentally include the outcome or a similar variable in your model.

 

proc freq data=have;
table outcome * (variables in model);
run;

@Mariloud wrote:

I am trying to predict a binary outcome using logistic regression, but I keep getting this warning 'There is a complete separation of data points. The maximum likelihood does not exist'. I tried the firth and exact statement to solve this issue, but still the same. Is there another way to solve this issue? Thanks

 

 


 

 

 

Mariloud
Obsidian | Level 7

I see, thank you!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3893 views
  • 0 likes
  • 4 in conversation