02-27-2017 10:06 PM
Hello Good People!
I am analysing students academic performance based on the number of years spent in high school. My major predictor variable is years coded in 1 for (5 years) in high school and 2 for (4years). The response variable is students performance in national exams, which has been categorised based on their success at entering the university. This is coded 1-5, with 5 being the best grade with complete success of university admission. The ordinality of the response variable allows for ordinal logistic regression. I also have other variables such as school type i.e. either top tier or lower tier school can also predict your performance. I also have gender and location of school districts as possible predictors.
After runing the logistics, I find that the Score Test for the Proportional Odds Assumption is not held. I did further test (
EMPIRACAL PLOTS) to check the parallelism of all predictors with the respon variable and they show very parallel. So I can use visualization of this accept that that the propostional ODDs is met.
Score Test for the Proportional Odds Assumption | ||
Chi-Square | DF | Pr > ChiSq |
3384.2110 | 1 | <.0001 |
However, the deviance and Person p-values are all significant. See below:
Deviance and Pearson Goodness-of-Fit Statistics | ||||
Criterion | Value | DF | Value/DF | Pr > ChiSq |
Deviance | 3485.9780 | 3 | 1161.993 | <.0001 |
Pearson | 3376.3413 | 3 | 1125.447 | <.0001 |
My question is is there anything I can do to continue with this analysis? Can I just continue with the anlysis and ignore the significant Deviance and Pearson p-values?
I will appreciate your help.
Francios
02-27-2017 10:21 PM
NO. You should not .
Deviance and Pearson Goodness-of-Fit Statistics
says your model doesn't fit good.
Value/DF should be near 1 if your model fit data very well.
02-28-2017 05:31 AM
How many observations in your data? For very large data sets, the goodness-of-fit statistics will always reject the null hypothesis. You can use other statistics (ROC curves, accuracy of predictions on a hold-out sample,...) to assess the fit.
02-28-2017 07:47 PM
02-28-2017 10:38 AM - edited 02-28-2017 10:40 AM
The question of sample size here is important. As discussed in this note, the test for proportional odds is known to be liberal with small sample sizes. Your graphical assessment might be more important. Also, as discussed in this note and in the "Details: Overdispersion: Rescaling the Covariance Matrix" section of the LOGISTIC documentation, the Pearson and deviance statistics require replication within the subpopulations in order to be valid. If there is suitable replication, then the similarity of the two statistics suggests they are providing a reasonable test of fit and their significance could be due to overdispersion or an incorrectly specified model. You might want to try adding complexity to the model (interactions, quadratic terms, splines, etc.) as seems reasonable to try to achieve a correctly specified model. If these statistics are still significant, then you might have a problem with overdispersion. The second note mentioned above discusses this.
02-28-2017 07:50 PM