Why is SAS providing a coefficient estimate when a variable predicts f...

BobSmith · Posted 06-26-2018 12:22 AM

I'm running a model similar to the following:

proc logistic data=table;
  model Y = X1 X2 X1*X2 X3 X4 X5;
run;

In this model, Y equals 0 or 1 while X1 and X2 are indicator variables (equal to 0 or 1) and X3, X4, and X5 are continuous. In this sample, Y = 0 for all observations where X1*X2 = 1. Thus, X1*X2 should not be estimable. However, SAS still provides a point estimate and a statistically significant p value for X1*X2 without displaying any error or warning in the log such as separation of data points. As far as SAS is concerned, "convergence criterion (GCONV=1E-8) satisfied" and all is dandy in the world.

Why? What is going on? Surely SAS shouldn't be behaving this way? When running this same model on the same sample in Stata, Stata appropriately drops X1*X2 when estimating this model.

Any insights on this would be great.

PeterClemmensen · Posted 06-26-2018 12:41 AM

If X1 and X2 are binary variables, you should not treat them as regression variables. Put a Class Statement above your Model Statement like this

class X1 X2;

The DATA to DATA Step Macro
Blog: SASnrd

PGStats · Posted 06-26-2018 01:02 AM

Looks to me like X2 is an excellent predictor for Y. Colinearity is a problem when it occurs between predictors, in which case it is sometimes better to drop one of the culprits. But one does expect some sort of relationship between the dependent variable and its predictors. Issuing a note when that relationship is a little too perfect might be a good idea though.

PG

BobSmith · Posted 06-26-2018 01:13 AM

Looks to me like X2 is an excellent predictor for Y. Colinearity is a problem when it occurs between predictors, in which case it is sometimes better to drop one of the culprits. But one does expect some sort of relationship between the dependent variable and its predictors. Issuing a note when that relationship is a little too perfect might be a good idea though.

PG

Edited my original post to clarify the model. However, the original point still stands. You should not be able to estimate a point estimate for a variable in a logistic model via maximum likelihood if that variable has no variation in Y. For example, see http://support.sas.com/rnd/app/stat/papers/logistic.pdf or https://www.statalist.org/forums/forum/general-stata-discussion/general/1357105-stata-omits-variable... or page 5 of https://www.stata.com/manuals13/rlogit.pdf.

I would expect SAS to at least throw a warning or an error when this happens. It should not be providing a point estimate with p values and pretending like nothing is wrong. Does anyone know why SAS is behaving this way?

Rick_SAS · Posted 06-26-2018 08:30 AM

You haven't provided data, so there is not a lot we can say. Issues like this usually require looking at the data.

I can say that when I try to reproduce your claim by using a simulation, SAS reports the error that you are expecting. Try running the code below. Do you see these warnings? If so, maybe your data are not what you believe them to be.

SAS Log:

WARNING: There is possibly a quasi-complete separation of data points.
The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning.
Results shown are based on the last maximum likelihood
iteration. Validity of the model fit is questionable.

SAS Output:

SAS Output

Model Convergence Status
Quasi-complete separation of data points detected.

Warning:

The maximum likelihood estimate may not exist.

data Have;
call streaminit(1234);
do i = 1 to 200;
   x1 = rand("Bernoulli", 0.7);
   x2 = rand("Bernoulli", 0.5);
   x3 = rand("Normal", 2, 3);
   x4 = rand("Normal", 0, 1);
   x5 = rand("Normal", -1, 2);
   eta = x1 - x2 + 0.5*x1*x2 + x3 - 2*x4 + 3*x5;
   if x1*x2=1 then 
      Y = 1;
   else
      Y = rand("Bernoulli", logistic(eta));
   output;
end;
run;

proc logistic data=Have;
 class x1 x2;
 model Y(event='1') = X1 X2 X1*X2 X3 X4 X5;  /* quasi-separation */
 *model Y = X1 X2 X3 X4 X5;  /* model OK */
run;

BobSmith · Posted 06-26-2018 01:50 PM

I can't provide the data on a public form. However, I know usually that a warning message is displayed. I've seen complete or quasi-separation of data point warning messages before. (I get the quasi-separation of data points warning when running your code.) In my case, however, no warning is being displayed. I assure you my data is as described. Plus, Stata behaves exactly as expected by dropping the variable so...

Maybe I could privately share the dataset with someone at SAS who can diagnose? This may be a rare edge case. SAS has been known to provide misleading coefficients before without appropriate warning messages (https://pdfs.semanticscholar.org/4f17/1322108dff719da6aa0d354d5f73c9c474de.pdf).

Rick_SAS · Posted 06-26-2018 03:21 PM

SAS Technical Support is always happy to help.

Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

Re: Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

Re: Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

Re: Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

Re: Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

Re: Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

Re: Why is SAS providing a coefficient estimate when a variable predicts failure perfectly?

SAS Innovate 2025: Call for Content

Classroom Training Available!