turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Why is SAS providing a coefficient estimate when a...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

4 weeks ago - last edited 4 weeks ago

I'm running a model similar to the following:

proc logistic data=table; model Y = X1 X2 X1*X2 X3 X4 X5; run;

In this model, Y equals 0 or 1 while X1 and X2 are indicator variables (equal to 0 or 1) and X3, X4, and X5 are continuous. In this sample, Y = 0 for all observations where X1*X2 = 1. Thus, X1*X2 should not be estimable. However, SAS still provides a point estimate and a statistically significant p value for X1*X2 __without displaying any error or warning__ in the log such as separation of data points. As far as SAS is concerned, "convergence criterion (GCONV=1E-8) satisfied" and all is dandy in the world.

Why? What is going on? Surely SAS shouldn't be behaving this way? When running this same model on the same sample in Stata, Stata appropriately drops X1*X2 when estimating this model.

Any insights on this would be great.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BobSmith

4 weeks ago

If X1 and X2 are binary variables, you should not treat them as regression variables. Put a Class Statement above your Model Statement like this

`class X1 X2;`

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BobSmith

4 weeks ago

Looks to me like X2 is an excellent predictor for Y. Colinearity is a problem when it occurs between predictors, in which case it is sometimes better to drop one of the culprits. But one does expect some sort of relationship between the dependent variable and its predictors. Issuing a note when that relationship is a little too perfect might be a good idea though.

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

4 weeks ago - last edited 4 weeks ago

Looks to me like X2 is an excellent predictor for Y. Colinearity is a problem when it occurs between predictors, in which case it is sometimes better to drop one of the culprits. But one does expect some sort of relationship between the dependent variable and its predictors. Issuing a note when that relationship is a little too perfect might be a good idea though.

PG

Edited my original post to clarify the model. However, the original point still stands. You should not be able to estimate a point estimate for a variable in a logistic model via maximum likelihood if that variable has no variation in Y. For example, see http://support.sas.com/rnd/app/stat/papers/logistic.pdf or https://www.statalist.org/forums/forum/general-stata-discussion/general/1357105-stata-omits-variable... or page 5 of https://www.stata.com/manuals13/rlogit.pdf.

I would expect SAS to at least throw a warning or an error when this happens. It should not be providing a point estimate with p values and pretending like nothing is wrong. Does anyone know why SAS is behaving this way?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BobSmith

4 weeks ago

You haven't provided data, so there is not a lot we can say. Issues like this usually require looking at the data.

I can say that when I try to reproduce your claim by using a simulation, SAS reports the error that you are expecting. Try running the code below. Do you see these warnings? If so, maybe your data are not what you believe them to be.

**SAS Log:**

WARNING: There is possibly a quasi-complete separation of data points.

The maximum likelihood estimate may not exist.

WARNING: The LOGISTIC procedure continues in spite of the above warning.

Results shown are based on the last maximum likelihood

iteration. Validity of the model fit is questionable.

**SAS Output:**

SAS Output

Model Convergence Status |
---|

Quasi-complete separation of data points detected. |

Warning: | The maximum likelihood estimate may not exist. |

```
data Have;
call streaminit(1234);
do i = 1 to 200;
x1 = rand("Bernoulli", 0.7);
x2 = rand("Bernoulli", 0.5);
x3 = rand("Normal", 2, 3);
x4 = rand("Normal", 0, 1);
x5 = rand("Normal", -1, 2);
eta = x1 - x2 + 0.5*x1*x2 + x3 - 2*x4 + 3*x5;
if x1*x2=1 then
Y = 1;
else
Y = rand("Bernoulli", logistic(eta));
output;
end;
run;
proc logistic data=Have;
class x1 x2;
model Y(event='1') = X1 X2 X1*X2 X3 X4 X5; /* quasi-separation */
*model Y = X1 X2 X3 X4 X5; /* model OK */
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

4 weeks ago - last edited 4 weeks ago

I can't provide the data on a public form. However, I know usually that a warning message is displayed. I've seen complete or quasi-separation of data point warning messages before. (I get the quasi-separation of data points warning when running your code.) In my case, however, no warning is being displayed. I assure you my data is as described. Plus, Stata behaves exactly as expected by dropping the variable so...

Maybe I could privately share the dataset with someone at SAS who can diagnose? This may be a rare edge case. SAS has been known to provide misleading coefficients before without appropriate warning messages (https://pdfs.semanticscholar.org/4f17/1322108dff719da6aa0d354d5f73c9c474de.pdf).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BobSmith

4 weeks ago

SAS Technical Support is always happy to help.