Hello,
I need help with choosing a goodness of fit test for binary logistic regression.
I have one independent variable, which is also categorical (binary) in nature. I got a not-so-fascinating c score of .54 and also a slightly moving-upward ROC curve.
I used the below code and obtained the below result:
proc logistic data=done.input_survey_v3 plots(maxpoints=none)=effect;
model Q6(event='1')=Q7 / aggregate scale=none;
run;
What does aggregate= option do? When must it be used?
Is Hosmer and Lemeshow GOF test meant for binary response data? I even tried that and got the same result - blanks.
I thank in advance for any help.
Regards,
MS
Unfortunately, R-square is not uniquely defined in generalized linear models like logistic models. There are many R-square statistics that have been devised but don't really have the clear "variance explained" interpretation of R-square in ordinary least-squares regression as from PROC REG and PROC GLM. Several R-square statistics are available with the GOF option in PROC LOGISTIC. Between this and since, with only one or two binary predictors there are only two or four populations, R-square is probably not a very useful statistic for assessing goodness of fit. If there is sufficient replication within the populations, then the Pearson and Deviance statistics are reasonable fit tests. In the case of two binary predictors, the model is not saturated if you don't include their interaction, so these tests and the additional tests provided by the GOF option can be useful.
A lot is going to depend on the data you are fitting. Can you share that dataset?
SteveDenham
Hello @SteveDenham,
Attached is a sample of my data. Q6 is my dependent var. and Q7 my independent.
Thank you for taking the time to look into my data.
Regards,
MS
Since both variables are binary, you really have just a 2x2 table that defines two proportions. There is no need to fit a model for such data. You can simply compare those two proportions using the CHISQ option in PROC FREQ.
Concerning PROC LOGISTIC, the goodness of fit statistics have zero degrees of freedom because, with just a single binary predictor, the model is saturated and there are no degrees of freedom remaining. The AGGREGATE option is used in more complex models if the data is collected in subpopulations defined more precisely than by the covariates in the model as further described in this note.
Hello @StatDave,
Thank you for the explanation. It just slipped my mind that I could run a simple measures of association.
Having said that, I have seen several courses show fitting a model to such data (2 binary variables). Wouldn't the r2 always be too small in such cases? Obviously, the r2 value can't be improved as there are no independent variables to add.
I do look forward to your opinion.
Regards,
Mari
Unfortunately, R-square is not uniquely defined in generalized linear models like logistic models. There are many R-square statistics that have been devised but don't really have the clear "variance explained" interpretation of R-square in ordinary least-squares regression as from PROC REG and PROC GLM. Several R-square statistics are available with the GOF option in PROC LOGISTIC. Between this and since, with only one or two binary predictors there are only two or four populations, R-square is probably not a very useful statistic for assessing goodness of fit. If there is sufficient replication within the populations, then the Pearson and Deviance statistics are reasonable fit tests. In the case of two binary predictors, the model is not saturated if you don't include their interaction, so these tests and the additional tests provided by the GOF option can be useful.
Thank you @StatDave,. Really appreciate it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.