Programming the statistical procedures from SAS

logistic regression: confusion matrix

Reply
Contributor
Posts: 20

logistic regression: confusion matrix

 

Hi there,

 

I run a logistic regression with binary outcomes 0 and 1. I obtained the confusion matrix. However the predicted value of 1 is missing. All observations have a predictive value of 0. Looking at the predicted probabilities, the probability that Y = 1 is smaller than Y = 0 for all observations. Does anyone know the reason and how to fix this problem?

 

Thanks.

SAS Super FREQ
Posts: 3,306

Re: logistic regression: confusion matrix

I don't know what you mean by "fixing the problem." You have data and you specified a model. According to the specified model, P(Y=1) < 0.5 for all observations.

You can try changing the model (easy) or gathering more data (harder), especially for cases where Y=1.

Are there any warning in the SAS log? If you are getting warnings about "quasi-complete separation," you might want to read the paper "Convergence Failures in Logistic Regression" by Paul Allison (2008): http://www2.sas.com/proceedings/forum2008/360-2008.pdf
Contributor
Posts: 20

Re: logistic regression: confusion matrix

Thank you for taking the question.

You are right it can't be fixed.The data contains 3 millions of observations with 70,000 missing values ( about 2%) that SAS ignores as usual.

My question is why it would happen even thouh the data definitely has value Y=1. Does it have to do with the predictors?

 

Thanks again

 

Contributor
Posts: 20

Re: logistic regression: confusion matrix

I forgot to mention that there was no problem of convergence. The log file did not display any warning

SAS Super FREQ
Posts: 3,306

Re: logistic regression: confusion matrix

If I were to guess, it would be that the predictors have a very small effect, relative to the constant term in the model. Study the following simulated data. The explanatory makes a relatively small contribution to the linear model. Even though x variable is significant (small p-value), the variable just doesn't have much of an effect.  The predicted probabilities are all less than 0.5.

 

data a;
call streaminit(1234);
do i = 1 to 1000;
   x = rand("normal");
   eta = -1 + 0.15*x;
   y = rand("bernoulli", logistic(eta));
   output;
end;
run;

proc logist data=a plots(only)=fitplot;
model y(event='1') = x;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 660 views
  • 0 likes
  • 2 in conversation