11-13-2015 08:31 AM
I have a strange issue with quasi-separation of data points in PROC LOGISTIC.
In my dataset I have variable Y (outcome) and X. When I run the model Y=X , SAS tells me that I have "Quasi-complete separation of data points detected" I am not surprised since the pattern in the dataset looks like this:
Y=0 X=1 (n=13)
Y=0 X=0 (n=288)
Y=1 X=0 (n=106)
Now to the issue:
If I change one value in the dataset so the dataset look like this (pattern not changed)
Y=0 X=1 (n=12)
Y=0 X=0 (n=289)
Y=1 X=0 (n=106)
Now, sas won't give me any warning about quasi-separation..
Anyone have any idea whay it is like this. I think I still have quasi-separation?
I attach a txt-file with 3 variables Y X1 (before changing the value) and X2 (after changning the value)
Thanks in advance!
11-14-2015 02:44 PM
First of all, there is a minor discrepancy between the attached data and the frequency counts you provide: Only after deleting observations 14, 15 and 16 (which look a bit misplaced) my frequency counts match yours.
But this doesn't change the obvious fact that there is, in fact, a quasi-complete separation of data points in each of the four cases (models "Y=X1" and "Y=X2" with or without the above data correction): The minimum value of X in the subset of data points (X, Y) with Y=0 equals the maximum value of X in the subset of data points (X, Y) with Y=1. Both are zero.
The difference between the two scenarios (X1 vs. X2) is just that for X2 the iterative process used to compute the maximum likelihood estimates appears to converge: The convergence criterion is met -- in spite of the quasi-complete separation. This is documented in the output where it says (under "Model Convergence Status"): "Convergence criterion (GCONV=1E-8) satisfied."
That 1E-8 is the default setting of the GCONV= option. If you tighten the convergence criterion only a little bit -- to GCONV=0.92E-8 or less in this example --, it will no longer be met and you'll get the familiar warning about quasi-complete separation also for X2:
proc logistic data=test desc; model y=x2 / gconv=0.92e-8; run;
With or without that warning, the "telltale signs of quasi-complete separation" (Paul D. Allison: Logistic Regression using SAS. SAS Institute Inc. 1999, p. 44), large (absolute) estimate and standard error (and p-value), are present anyway and indicate that the affected independent variable may be problematic.
Need further help from the community? Please ask a new question.