BookmarkSubscribeRSS Feed
BobSmith
Fluorite | Level 6

I'm running a model similar to the following:

 

proc logistic data=table;
  model Y = X1 X2 X1*X2 X3 X4 X5;
run; 

In this model, Y equals 0 or 1 while X1 and X2 are indicator variables (equal to 0 or 1) and X3, X4, and X5 are continuous. In this sample, Y = 0 for all observations where X1*X2 = 1. Thus, X1*X2 should not be estimable. However, SAS still provides a point estimate and a statistically significant p value for X1*X2 without displaying any error or warning in the log such as separation of data points. As far as SAS is concerned, "convergence criterion (GCONV=1E-8) satisfied" and all is dandy in the world.

 

Why? What is going on? Surely SAS shouldn't be behaving this way? When running this same model on the same sample in Stata, Stata appropriately drops X1*X2 when estimating this model.

 

Any insights on this would be great.

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

If X1 and X2 are binary variables, you should not treat them as regression variables. Put a Class Statement above your Model Statement like this

 

class X1 X2;
PGStats
Opal | Level 21

Looks to me like X2 is an excellent predictor for Y. Colinearity is a problem when it occurs between predictors, in which case it is sometimes better to drop one of the culprits. But one does expect some sort of relationship between the dependent variable and its predictors. Issuing a note when that relationship is a little too perfect might be a good idea though.

PG
BobSmith
Fluorite | Level 6

Looks to me like X2 is an excellent predictor for Y. Colinearity is a problem when it occurs between predictors, in which case it is sometimes better to drop one of the culprits. But one does expect some sort of relationship between the dependent variable and its predictors. Issuing a note when that relationship is a little too perfect might be a good idea though.

 
PG

 

Edited my original post to clarify the model. However, the original point still stands. You should not be able to estimate a point estimate for a variable in a logistic model via maximum likelihood if that variable has no variation in Y. For example, see http://support.sas.com/rnd/app/stat/papers/logistic.pdf or https://www.statalist.org/forums/forum/general-stata-discussion/general/1357105-stata-omits-variable... or page 5 of https://www.stata.com/manuals13/rlogit.pdf.

 

I would expect SAS to at least throw a warning or an error when this happens. It should not be providing a point estimate with p values and pretending like nothing is wrong. Does anyone know why SAS is behaving this way?

Rick_SAS
SAS Super FREQ

You haven't provided data, so there is not a lot we can say. Issues like this usually require looking at the data.

 

I can say that when I try to reproduce your claim by using a simulation, SAS reports the error that you are expecting. Try running the code below. Do you see these warnings? If so, maybe your data are not what you believe them to be.

 

SAS Log:

WARNING: There is possibly a quasi-complete separation of data points.
The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning.
Results shown are based on the last maximum likelihood
iteration. Validity of the model fit is questionable.

 

SAS Output:

SAS Output

Model Convergence Status
Quasi-complete separation of data points detected.


Warning: The maximum likelihood estimate may not exist.

 

data Have;
call streaminit(1234);
do i = 1 to 200;
   x1 = rand("Bernoulli", 0.7);
   x2 = rand("Bernoulli", 0.5);
   x3 = rand("Normal", 2, 3);
   x4 = rand("Normal", 0, 1);
   x5 = rand("Normal", -1, 2);
   eta = x1 - x2 + 0.5*x1*x2 + x3 - 2*x4 + 3*x5;
   if x1*x2=1 then 
      Y = 1;
   else
      Y = rand("Bernoulli", logistic(eta));
   output;
end;
run;

proc logistic data=Have;
 class x1 x2;
 model Y(event='1') = X1 X2 X1*X2 X3 X4 X5;  /* quasi-separation */
 *model Y = X1 X2 X3 X4 X5;  /* model OK */
run;
BobSmith
Fluorite | Level 6

I can't provide the data on a public form. However, I know usually that a warning message is displayed. I've seen complete or quasi-separation of data point warning messages before. (I get the quasi-separation of data points warning when running your code.) In my case, however, no warning is being displayed. I assure you my data is as described. Plus, Stata behaves exactly as expected by dropping the variable so...

 

Maybe I could privately share the dataset with someone at SAS who can diagnose? This may be a rare edge case. SAS has been known to provide misleading coefficients before without appropriate warning messages (https://pdfs.semanticscholar.org/4f17/1322108dff719da6aa0d354d5f73c9c474de.pdf).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1008 views
  • 4 likes
  • 4 in conversation