BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MikeTurner
Calcite | Level 5

I performed logistic regression on my data. The results show that Hosmer and Lemeshow Goodness-of-Fit Test, Global Null Hypothesis test and Analysis of Parameter Estimates are all significant. But the value of R^square is only 0.176. What does that means? Is My regression model valid?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
plf515
Lapis Lazuli | Level 10

The H-L goodness of fit test tests something different from the overall model fit test.  You want the H-L test to be non-significant, or, more precisely, you want it to be small. A large value of H-L indicates a problem with your model. SAS prints a table with details.

The overall model test says whether your null can be rejected.  But be careful; statistical significance does NOT mean what many think it means. It is NOT the likelihood of the parameters being 0, it is the probability of getting results as extreme or more extreme as you got in a sample of your size drawn from a population where the parameter is 0. This is rarely a useful question.

Whether a pseudo R2 of .18 is "large" depends on the field. In social sciences, it is pretty darn good. In physics, it would be lousy.

All of which illustrates the point that it is hard to answer a question like this sensibly without context.

View solution in original post

6 REPLIES 6
DrAnnmaria
Fluorite | Level 6

The null hypothesis you are testing is that the parameter estimate = 0. That is all statistical significance means, that if the population value is 0 you would be expected to get these results less than 5% of the time. Significance is a function of both your sample size and variance. In brief, if you have a large number of people or a small population variance, your obtained value can be very close to zero and still statistically significant.

So, significant does not mean large, it just means probably not zero.

I'm also interested that you consider .176 a small value for explained variance. How many variables do you have in your equation? What is your dependent variable? For most things in life, if I could explain 18% of the variance through a few variables I'd be so happy I would be tap-dancing. In reality, what four variables predicted your decision to post on this forum (oh, excuse me, community) or my decision to answer it?

There is certainly nothing to say that a model cannot have a pseudo-R2 of .176 and significant goodness of fit tests.

Ksharp
Super User

In my opinion.

R^square means nothing for logistic model.

Because R^square is calculated based on Normal Distribution,

whereas logistic model use logistic Distribution.

Also you can't do some Regression Test like Linear Regression.

Ksharp

DrAnnmaria
Fluorite | Level 6

While it is strictly true that logistic regression does not give you an r-squared calculated the same as in ordinary least squares regression, you can get a pseudo- R2 using proc logistic. See here for example and a good explanation.

SAS gives the likelihood-based pseudo R-square measure and its rescaled measure. Categorical Data Analysis Using The SAS System, by M. Stokes, C. Davis and G. Koch offers more details on how the generalized R-square measures that you can request are constructed and how to interpret them.

proc logistic data = hsb2;
  class prog(ref='1') /param = ref;
  model hiwrite(event='1') = female prog read math / rsq lackfit;
run;

from http://www.ats.ucla.edu/stat/sas/seminars/sas_logistic/logistic1.htm

Ksharp
Super User

Thank you.  DrAnnmaria

MikeTurner
Calcite | Level 5

Thank you all.

Is  "R-Square 0.1239    Max-rescaled R-Square 0.1654" in my results pseudo- R2 you mentioned here?

plf515
Lapis Lazuli | Level 10

The H-L goodness of fit test tests something different from the overall model fit test.  You want the H-L test to be non-significant, or, more precisely, you want it to be small. A large value of H-L indicates a problem with your model. SAS prints a table with details.

The overall model test says whether your null can be rejected.  But be careful; statistical significance does NOT mean what many think it means. It is NOT the likelihood of the parameters being 0, it is the probability of getting results as extreme or more extreme as you got in a sample of your size drawn from a population where the parameter is 0. This is rarely a useful question.

Whether a pseudo R2 of .18 is "large" depends on the field. In social sciences, it is pretty darn good. In physics, it would be lousy.

All of which illustrates the point that it is hard to answer a question like this sensibly without context.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 9962 views
  • 1 like
  • 4 in conversation