topic Re: Strange with quasi-separation of data points in Statistical Procedures

Strange with quasi-separation of data points

bollibompa — Fri, 13 Nov 2015 13:31:49 GMT

Hi,

I have a strange issue with quasi-separation of data points in PROC LOGISTIC.

In my dataset I have variable Y (outcome) and X. When I run the model Y=X , SAS tells me that I have "Quasi-complete separation of data points detected" I am not surprised since the pattern in the dataset looks like this:

Y=0 X=1 (n=13)

Y=0 X=0 (n=288)

Y=1 X=0 (n=106)

Now to the issue:

If I change one value in the dataset so the dataset look like this (pattern not changed)

Y=0 X=1 (n=12)

Y=0 X=0 (n=289)

Y=1 X=0 (n=106)

Now, sas won't give me any warning about quasi-separation..

Anyone have any idea whay it is like this. I think I still have quasi-separation?

I attach a txt-file with 3 variables Y X1 (before changing the value) and X2 (after changning the value)

Thanks in advance!

Thomas

Re: Strange with quasi-separation of data points

FreelanceReinh — Sat, 14 Nov 2015 19:44:38 GMT

Hi Thomas,

First of all, there is a minor discrepancy between the attached data and the frequency counts you provide: Only after deleting observations 14, 15 and 16 (which look a bit misplaced) my frequency counts match yours.

But this doesn't change the obvious fact that there is, in fact, a quasi-complete separation of data points in each of the four cases (models "Y=X1" and "Y=X2" with or without the above data correction): The minimum value of X in the subset of data points (X, Y) with Y=0 equals the maximum value of X in the subset of data points (X, Y) with Y=1. Both are zero.

The difference between the two scenarios (X1 vs. X2) is just that for X2 the iterative process used to compute the maximum likelihood estimates appears to converge: The convergence criterion is met -- in spite of the quasi-complete separation. This is documented in the output where it says (under "Model Convergence Status"): "Convergence criterion (GCONV=1E-8) satisfied."

That 1E-8 is the default setting of the GCONV= option. If you tighten the convergence criterion only a little bit -- to GCONV=0.92E-8 or less in this example --, it will no longer be met and you'll get the familiar warning about quasi-complete separation also for X2:

proc logistic data=test desc;
model y=x2 / gconv=0.92e-8;
run;

With or without that warning, the "telltale signs of quasi-complete separation" (Paul D. Allison: Logistic Regression using SAS. SAS Institute Inc. 1999, p. 44), large (absolute) estimate and standard error (and p-value), are present anyway and indicate that the affected independent variable may be problematic.

Re: Strange with quasi-separation of data points

bollibompa — Thu, 19 Nov 2015 09:30:14 GMT

Many thanks for your detailed description! It helped me a lot!

/Thomas