turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Strange with quasi-separation of data points

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-13-2015 08:31 AM

Hi,

I have a strange issue with quasi-separation of data points in PROC LOGISTIC.

In my dataset I have variable Y (outcome) and X. When I run the model Y=X , SAS tells me that I have "Quasi-complete separation of data points detected" I am not surprised since the pattern in the dataset looks like this:

Y=0 X=1 (n=13)

Y=0 X=0 (n=288)

Y=1 X=0 (n=106)

Now to the issue:

If I change one value in the dataset so the dataset look like this (pattern not changed)

Y=0 X=1** (n=12)**

Y=0 X=0** (n=289)**

Y=1 X=0 (n=106)

Now, sas won't give me any warning about quasi-separation..

Anyone have any idea whay it is like this. I think I still have quasi-separation?

I attach a txt-file with 3 variables Y X1 (before changing the value) and X2 (after changning the value)

Thanks in advance!

Thomas

Accepted Solutions

Solution

12-08-2015
02:59 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2015 04:30 AM

Many thanks for your detailed description! It helped me a lot!

/Thomas

/Thomas

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2015 02:44 PM

Hi Thomas,

First of all, there is a minor discrepancy between the attached data and the frequency counts you provide: Only after deleting observations 14, 15 and 16 (which look a bit misplaced) my frequency counts match yours.

But this doesn't change the obvious fact that there is, in fact, a quasi-complete separation of data points in each of the four cases (models "Y=X1" and "Y=X2" with or without the above data correction): The minimum value of X in the subset of data points (X, Y) with Y=0 equals the maximum value of X in the subset of data points (X, Y) with Y=1. Both are zero.

The difference between the two scenarios (X1 vs. X2) is just that for X2 the iterative process used to compute the maximum likelihood estimates appears to converge: The convergence criterion is met -- in spite of the quasi-complete separation. This is documented in the output where it says (under "Model Convergence Status"): "Convergence criterion (GCONV=1E-8) satisfied."

That 1E-8 is the default setting of the GCONV= option. If you tighten the convergence criterion only a little bit -- to GCONV=0.92E-8 or less in this example --, it will no longer be met and you'll get the familiar warning about quasi-complete separation also for X2:

```
proc logistic data=test desc;
model y=x2 / gconv=0.92e-8;
run;
```

With or without that warning, the "telltale signs of quasi-complete separation" (Paul D. Allison: Logistic Regression using SAS. SAS Institute Inc. 1999, p. 44), large (absolute) estimate and standard error (and p-value), are present anyway and indicate that the affected independent variable may be problematic.

Solution

12-08-2015
02:59 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2015 04:30 AM

Many thanks for your detailed description! It helped me a lot!

/Thomas

/Thomas