BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bollibompa
Quartz | Level 8

Hi,

I have a strange issue with quasi-separation of data points in PROC LOGISTIC.

 

In my dataset I have variable Y (outcome) and X. When I run the model Y=X , SAS tells me that I have "Quasi-complete separation of data points detected" I am not surprised since the pattern in the dataset looks like this:

Y=0 X=1 (n=13)

Y=0 X=0 (n=288)

Y=1 X=0 (n=106)

 

Now to the issue:

If I change one value in the dataset so the dataset look like this (pattern not changed)

Y=0 X=1 (n=12)

Y=0 X=0 (n=289)

Y=1 X=0 (n=106)

 

Now, sas won't give me any warning about quasi-separation..

Anyone have any idea whay it is like this. I think I still have quasi-separation?

 

I attach a txt-file with 3 variables Y X1 (before changing the value) and X2 (after changning the value)

 

Thanks in advance!

 

Thomas

1 ACCEPTED SOLUTION

Accepted Solutions
bollibompa
Quartz | Level 8
Many thanks for your detailed description! It helped me a lot!

/Thomas

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hi Thomas,

 

First of all, there is a minor discrepancy between the attached data and the frequency counts you provide: Only after deleting observations 14, 15 and 16 (which look a bit misplaced) my frequency counts match yours.

 

But this doesn't change the obvious fact that there is, in fact, a quasi-complete separation of data points in each of the four cases (models "Y=X1" and "Y=X2" with or without the above data correction): The minimum value of X in the subset of data points (X, Y) with Y=0 equals the maximum value of X in the subset of data points (X, Y) with Y=1. Both are zero.

 

The difference between the two scenarios (X1 vs. X2) is just that for X2 the iterative process used to compute the maximum likelihood estimates appears to converge: The convergence criterion is met -- in spite of the quasi-complete separation. This is documented in the output where it says (under "Model Convergence Status"): "Convergence criterion (GCONV=1E-8) satisfied."

That 1E-8 is the default setting of the GCONV= option. If you tighten the convergence criterion only a little bit -- to GCONV=0.92E-8 or less in this example --, it will no longer be met and you'll get the familiar warning about quasi-complete separation also for X2:

proc logistic data=test desc;
model y=x2 / gconv=0.92e-8;
run;

With or without that warning, the "telltale signs of quasi-complete separation" (Paul D. Allison: Logistic Regression using SAS. SAS Institute Inc. 1999, p. 44), large (absolute) estimate and standard error (and p-value), are present anyway and indicate that the affected independent variable may be problematic.

bollibompa
Quartz | Level 8
Many thanks for your detailed description! It helped me a lot!

/Thomas

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2266 views
  • 1 like
  • 2 in conversation