Programming the statistical procedures from SAS

Strange with quasi-separation of data points

Accepted Solution Solved
Reply
Contributor
Posts: 73
Accepted Solution

Strange with quasi-separation of data points

Hi,

I have a strange issue with quasi-separation of data points in PROC LOGISTIC.

 

In my dataset I have variable Y (outcome) and X. When I run the model Y=X , SAS tells me that I have "Quasi-complete separation of data points detected" I am not surprised since the pattern in the dataset looks like this:

Y=0 X=1 (n=13)

Y=0 X=0 (n=288)

Y=1 X=0 (n=106)

 

Now to the issue:

If I change one value in the dataset so the dataset look like this (pattern not changed)

Y=0 X=1 (n=12)

Y=0 X=0 (n=289)

Y=1 X=0 (n=106)

 

Now, sas won't give me any warning about quasi-separation..

Anyone have any idea whay it is like this. I think I still have quasi-separation?

 

I attach a txt-file with 3 variables Y X1 (before changing the value) and X2 (after changning the value)

 

Thanks in advance!

 

Thomas


Accepted Solutions
Solution
‎12-08-2015 02:59 AM
Contributor
Posts: 73

Re: Strange with quasi-separation of data points

Many thanks for your detailed description! It helped me a lot!

/Thomas

View solution in original post


All Replies
Trusted Advisor
Posts: 1,115

Re: Strange with quasi-separation of data points

Hi Thomas,

 

First of all, there is a minor discrepancy between the attached data and the frequency counts you provide: Only after deleting observations 14, 15 and 16 (which look a bit misplaced) my frequency counts match yours.

 

But this doesn't change the obvious fact that there is, in fact, a quasi-complete separation of data points in each of the four cases (models "Y=X1" and "Y=X2" with or without the above data correction): The minimum value of X in the subset of data points (X, Y) with Y=0 equals the maximum value of X in the subset of data points (X, Y) with Y=1. Both are zero.

 

The difference between the two scenarios (X1 vs. X2) is just that for X2 the iterative process used to compute the maximum likelihood estimates appears to converge: The convergence criterion is met -- in spite of the quasi-complete separation. This is documented in the output where it says (under "Model Convergence Status"): "Convergence criterion (GCONV=1E-8) satisfied."

That 1E-8 is the default setting of the GCONV= option. If you tighten the convergence criterion only a little bit -- to GCONV=0.92E-8 or less in this example --, it will no longer be met and you'll get the familiar warning about quasi-complete separation also for X2:

proc logistic data=test desc;
model y=x2 / gconv=0.92e-8;
run;

With or without that warning, the "telltale signs of quasi-complete separation" (Paul D. Allison: Logistic Regression using SAS. SAS Institute Inc. 1999, p. 44), large (absolute) estimate and standard error (and p-value), are present anyway and indicate that the affected independent variable may be problematic.

Solution
‎12-08-2015 02:59 AM
Contributor
Posts: 73

Re: Strange with quasi-separation of data points

Many thanks for your detailed description! It helped me a lot!

/Thomas
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 331 views
  • 0 likes
  • 2 in conversation