BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mszommer
Obsidian | Level 7

Hello,

I need help with choosing a goodness of fit test for binary logistic regression.

I have one independent variable, which is also categorical (binary) in nature. I got a not-so-fascinating c score of .54 and also a slightly moving-upward ROC curve.

 

I used the below code and obtained the below result:

  sas_image.png

proc logistic data=done.input_survey_v3 plots(maxpoints=none)=effect;
                    model Q6(event='1')=Q7 / aggregate scale=none;
run;

 

What does aggregate= option do? When must it be used?

Is Hosmer and Lemeshow GOF test meant for binary response data? I even tried that and got the same result - blanks.

 

I thank in advance for any help.

 

Regards,

MS

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Unfortunately, R-square is not uniquely defined in generalized linear models like logistic models. There are many R-square statistics that have been devised but don't really have the clear "variance explained" interpretation of R-square in ordinary least-squares regression as from PROC REG and PROC GLM. Several R-square statistics are available with the GOF option in PROC LOGISTIC. Between this and since, with only one or two binary predictors there are only two or four populations, R-square is probably not a very useful statistic for assessing goodness of fit. If there is sufficient replication within the populations, then the Pearson and Deviance statistics are reasonable fit tests. In the case of two binary predictors, the model is not saturated if you don't include their interaction, so these tests and the additional tests provided by the GOF option can be useful.

View solution in original post

6 REPLIES 6
SteveDenham
Jade | Level 19

A lot is going to depend on the data you are fitting.  Can you share that dataset?

 

SteveDenham

mszommer
Obsidian | Level 7

Hello @SteveDenham,

Attached is a sample of my data. Q6 is my dependent var. and Q7 my independent.

Thank you for taking the time to look into my data.

Regards,

MS

StatDave
SAS Super FREQ

Since both variables are binary, you really have just a 2x2 table that defines two proportions. There is no need to fit a model for such data. You can simply compare those two proportions using the CHISQ option in PROC FREQ.

 

Concerning PROC LOGISTIC, the goodness of fit statistics have zero degrees of freedom because, with just a single binary predictor, the model is saturated and there are no degrees of freedom remaining. The AGGREGATE option is used in more complex models if the data is collected in subpopulations defined more precisely than by the covariates in the model as further described in this note.

mszommer
Obsidian | Level 7

Hello @StatDave,

Thank you for the explanation. It just slipped my mind that I could run a simple measures of association.

Having said that, I have seen several courses show fitting a model to such data (2 binary variables). Wouldn't the r2 always be too small in such cases? Obviously, the r2 value can't be improved as there are no independent variables to add.

 

I do look forward to your opinion.

 

Regards,

Mari

StatDave
SAS Super FREQ

Unfortunately, R-square is not uniquely defined in generalized linear models like logistic models. There are many R-square statistics that have been devised but don't really have the clear "variance explained" interpretation of R-square in ordinary least-squares regression as from PROC REG and PROC GLM. Several R-square statistics are available with the GOF option in PROC LOGISTIC. Between this and since, with only one or two binary predictors there are only two or four populations, R-square is probably not a very useful statistic for assessing goodness of fit. If there is sufficient replication within the populations, then the Pearson and Deviance statistics are reasonable fit tests. In the case of two binary predictors, the model is not saturated if you don't include their interaction, so these tests and the additional tests provided by the GOF option can be useful.

mszommer
Obsidian | Level 7

Thank you @StatDave,. Really appreciate it.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1343 views
  • 0 likes
  • 3 in conversation