Hi,
I ran a logistic regression on my dataset and got the following error:
ERROR: All observations have the same response. No statistics are computed.
Could it be so because I am modeling rare events? (~175 events out of 200,000+ observations)
Thanks for any advice
This message should only be printed if FLAG takes a single value. Do your predictors have any missing values? If so, those observations are dropped from the model fitting, and if all observations with FLAG=1 are removed the message is displayed.
Yes, but more specifically I'd guess that there is some classification variable for which all the Y=1 belong to a single category. For example, if you are using CLASS GENDER, maybe all of the Y=1 values are male.
Try using PROC FREQ to cross tabulate Y with your classification varialbles. You might see an empty cell. That would indicate that you cannot use that classification variable as an explanatory variable.
Hi Rick,
Thanks for the answer. I did not use any classification but instead, used proc logistic on my whole dataset. So there should not be a situation where there are all Y=1 or Y=0. That's also why I was confused that I got such an error.
Do you have a WHERE clause or other DATA= option that is filtering data? Or a format that coalesces values of an explanatory variable?
What does your MODEL statement look like?
Hi Rick,
My model is really simple. My code is:
proc sort data=fun;
by id year quarter;
run;
proc logistic data=fun descending;
model flag= var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12;
output out=propensity_scores pred=prob_flag;
run;
I'm not sure where any filtering or coalescing may have occurred.
Hmm, interesting. And what do you get from running the following?
proc freq data=fun; tables flag; run;
Hi Rick,
I get the following output:
Cumulative Cumulative
Flag Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 252784 99.93 252784 99.93
1 174 0.07 252958 100.00
My guess is that there is a linear combination of your continuous variables that perfectly explain your response. For example, if you are modeling "had a heart attack," it might be that "blood pressure," "cholesterol," and "stress level" do not individually predict the response, but when you include all of those variables in the model you discover that all of the heart attacks in the data can be predicted by the linear combination of those factors.
This message should only be printed if FLAG takes a single value. Do your predictors have any missing values? If so, those observations are dropped from the model fitting, and if all observations with FLAG=1 are removed the message is displayed.
Hi Bobderr,
Thanks for your advice. It seems that this is indeed the reason. When I removed some of the independent variables, the regression worked!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
