BookmarkSubscribeRSS Feed
Marcusliat
Calcite | Level 5

 

Hi there,

 

I run a logistic regression with binary outcomes 0 and 1. I obtained the confusion matrix. However the predicted value of 1 is missing. All observations have a predictive value of 0. Looking at the predicted probabilities, the probability that Y = 1 is smaller than Y = 0 for all observations. Does anyone know the reason and how to fix this problem?

 

Thanks.

4 REPLIES 4
Rick_SAS
SAS Super FREQ
I don't know what you mean by "fixing the problem." You have data and you specified a model. According to the specified model, P(Y=1) < 0.5 for all observations.

You can try changing the model (easy) or gathering more data (harder), especially for cases where Y=1.

Are there any warning in the SAS log? If you are getting warnings about "quasi-complete separation," you might want to read the paper "Convergence Failures in Logistic Regression" by Paul Allison (2008): http://www2.sas.com/proceedings/forum2008/360-2008.pdf
Marcusliat
Calcite | Level 5

Thank you for taking the question.

You are right it can't be fixed.The data contains 3 millions of observations with 70,000 missing values ( about 2%) that SAS ignores as usual.

My question is why it would happen even thouh the data definitely has value Y=1. Does it have to do with the predictors?

 

Thanks again

 

Marcusliat
Calcite | Level 5

I forgot to mention that there was no problem of convergence. The log file did not display any warning

Rick_SAS
SAS Super FREQ

If I were to guess, it would be that the predictors have a very small effect, relative to the constant term in the model. Study the following simulated data. The explanatory makes a relatively small contribution to the linear model. Even though x variable is significant (small p-value), the variable just doesn't have much of an effect.  The predicted probabilities are all less than 0.5.

 

data a;
call streaminit(1234);
do i = 1 to 1000;
   x = rand("normal");
   eta = -1 + 0.15*x;
   y = rand("bernoulli", logistic(eta));
   output;
end;
run;

proc logist data=a plots(only)=fitplot;
model y(event='1') = x;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 5796 views
  • 0 likes
  • 2 in conversation