I used a multinomial logistic regression to predict whether people have confidence on a certain issue.
The dependent variable has four categories
| 1 | (1)not confident | 
| 2 | (2)neutral | 
| 3 | (3)confident | 
| 4 | (4) unknown | 
Independent variables include Identities, age, gender, education attainment, employment status, born in a certain place or not, community (urban or rural) and interaction terms.
Following are the outcomes of the model:
| Model Fit Statistics | ||
| Criterion | Intercept Only | Intercept and | 
| AIC | 11678.775 | 11421.602 | 
| SC | 11698.052 | 12019.195 | 
| -2 Log L | 11672.775 | 11235.602 | 
| Testing Global Null Hypothesis: BETA=0 | |||
| Test | Chi-Square | DF | Pr > ChiSq | 
| Likelihood Ratio | 437.1734 | 90 | <.0001 | 
| Score | 456.9761 | 90 | <.0001 | 
| Wald | 396.1174 | 90 | <.0001 | 
| Deviance and Pearson Goodness-of-Fit Statistics | ||||
| Criterion | Value | DF | Value/DF | Pr > ChiSq | 
| Deviance | 4083.7242 | 4359 | 0.9368 | 0.9987 | 
| Pearson | 4494.1027 | 4359 | 1.0310 | 0.0750 | 
The Deviance and Pearson Goodness-of-Fit Statistics show that P-value for Deviance is high. However, the P-value for Pearson statistics is low, even it greater than 0.05. Can I draw a conclusion that the model fits the data well?
You didn't provide your PROC LOGISTIC statements or indicate if your multinomial response is ordinal or nominal. So, I have to assume that you treated it as nominal and therefore used the LINK=GLOGIT option. I also assume that you used the AGGREGATE option along with the SCALE=NONE option to get these tests, and since their DF are so large, that some of your predictors are continuous. As noted in the Details:Goodness of fit section of the LOGISTIC documentation, the Pearson and deviance statistics require sufficient replication within the populations in order to be valid and that substantial difference between the two are an indication that neither can be used. With one or more continuous predictors there usually is very little, if any, replication within the populations (the populations are defined by the unique settings of the predictors). See note 22630 (https://support.sas.com/kb/22/630.html) which goes more into assessing goodness of fit. As suggested there, you could use the Hosmer-Lemeshow test to assess fit.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
