Programming the statistical procedures from SAS

Solved
Contributor
Posts: 49

I performed logistic regression on my data. The results show that Hosmer and Lemeshow Goodness-of-Fit Test, Global Null Hypothesis test and Analysis of Parameter Estimates are all significant. But the value of R^square is only 0.176. What does that means? Is My regression model valid?

Thanks.

Accepted Solutions
Solution
‎07-05-2017 03:02 PM
Frequent Contributor
Posts: 140

The H-L goodness of fit test tests something different from the overall model fit test.  You want the H-L test to be non-significant, or, more precisely, you want it to be small. A large value of H-L indicates a problem with your model. SAS prints a table with details.

The overall model test says whether your null can be rejected.  But be careful; statistical significance does NOT mean what many think it means. It is NOT the likelihood of the parameters being 0, it is the probability of getting results as extreme or more extreme as you got in a sample of your size drawn from a population where the parameter is 0. This is rarely a useful question.

Whether a pseudo R2 of .18 is "large" depends on the field. In social sciences, it is pretty darn good. In physics, it would be lousy.

All of which illustrates the point that it is hard to answer a question like this sensibly without context.

All Replies
Occasional Contributor
Posts: 6

The null hypothesis you are testing is that the parameter estimate = 0. That is all statistical significance means, that if the population value is 0 you would be expected to get these results less than 5% of the time. Significance is a function of both your sample size and variance. In brief, if you have a large number of people or a small population variance, your obtained value can be very close to zero and still statistically significant.

So, significant does not mean large, it just means probably not zero.

I'm also interested that you consider .176 a small value for explained variance. How many variables do you have in your equation? What is your dependent variable? For most things in life, if I could explain 18% of the variance through a few variables I'd be so happy I would be tap-dancing. In reality, what four variables predicted your decision to post on this forum (oh, excuse me, community) or my decision to answer it?

There is certainly nothing to say that a model cannot have a pseudo-R2 of .176 and significant goodness of fit tests.

Super User
Posts: 10,194

In my opinion.

R^square means nothing for logistic model.

Because R^square is calculated based on Normal Distribution,

whereas logistic model use logistic Distribution.

Also you can't do some Regression Test like Linear Regression.

Ksharp

Occasional Contributor
Posts: 6

While it is strictly true that logistic regression does not give you an r-squared calculated the same as in ordinary least squares regression, you can get a pseudo- R2 using proc logistic. See here for example and a good explanation.

SAS gives the likelihood-based pseudo R-square measure and its rescaled measure. Categorical Data Analysis Using The SAS System, by M. Stokes, C. Davis and G. Koch offers more details on how the generalized R-square measures that you can request are constructed and how to interpret them.

`proc logistic data = hsb2;  class prog(ref='1') /param = ref;  model hiwrite(event='1') = female prog read math / rsq lackfit;run;`

Super User
Posts: 10,194

Thank you.  DrAnnmaria

Contributor
Posts: 49

Thank you all.

Is  "R-Square 0.1239    Max-rescaled R-Square 0.1654" in my results pseudo- R2 you mentioned here?

Solution
‎07-05-2017 03:02 PM
Frequent Contributor
Posts: 140

The H-L goodness of fit test tests something different from the overall model fit test.  You want the H-L test to be non-significant, or, more precisely, you want it to be small. A large value of H-L indicates a problem with your model. SAS prints a table with details.

The overall model test says whether your null can be rejected.  But be careful; statistical significance does NOT mean what many think it means. It is NOT the likelihood of the parameters being 0, it is the probability of getting results as extreme or more extreme as you got in a sample of your size drawn from a population where the parameter is 0. This is rarely a useful question.

Whether a pseudo R2 of .18 is "large" depends on the field. In social sciences, it is pretty darn good. In physics, it would be lousy.

All of which illustrates the point that it is hard to answer a question like this sensibly without context.

🔒 This topic is solved and locked.