BookmarkSubscribeRSS Feed
Francios
Calcite | Level 5

Hello Good People!

 

I am analysing students academic performance based on the number of years spent in high school. My major predictor variable is years coded in 1 for (5 years) in high school and 2 for (4years). The response variable is students performance in national exams, which has been categorised based on their success at entering the university.  This is coded 1-5, with 5 being the best grade with complete success of university admission. The ordinality of the response variable allows for ordinal logistic regression. I also have other variables such as school type i.e. either top tier or lower tier school can also predict your performance. I also have gender and location of school districts as possible predictors. 

 

After runing the logistics, I find that the Score Test for the Proportional Odds Assumption is not held.  I did further test (

EMPIRACAL PLOTS) to check the parallelism of all predictors with the respon variable and they show very parallel. So I can use visualization of this accept that that the propostional ODDs is met.

Score Test for the Proportional Odds Assumption

Chi-Square

DF

Pr > ChiSq

3384.2110

1

<.0001

 

 

However,  the deviance and Person p-values are all significant. See below:

 

Deviance and Pearson Goodness-of-Fit Statistics

Criterion

Value

DF

Value/DF

Pr > ChiSq

Deviance

3485.9780

3

1161.993

<.0001

Pearson

3376.3413

3

1125.447

<.0001

 

 

My question is is there anything I can do to continue with this analysis? Can I just continue with the anlysis and ignore the significant Deviance and Pearson p-values?

 

I will appreciate your help.

 

Francios

 

5 REPLIES 5
Ksharp
Super User

NO. You should not .

Deviance and Pearson Goodness-of-Fit Statistics

says your model doesn't fit good.

Value/DF should be near 1 if your model fit data very well.

Rick_SAS
SAS Super FREQ

How many observations in your data? For very large data sets, the goodness-of-fit statistics will always reject the null hypothesis. You can use other statistics (ROC curves, accuracy of predictions on a hold-out sample,...) to assess the fit.

Francios
Calcite | Level 5
Hi Rick,

Thank you for your comment. I have a very large dataset of about 77000 observations. I will implement the ROC curve to see what I get. I will get back to you for further assitance.

Thank you very much!

Best,

Francois
StatDave
SAS Super FREQ

The question of sample size here is important.  As discussed in this note, the test for proportional odds is known to be liberal with small sample sizes. Your graphical assessment might be more important.  Also, as discussed in this note and in the "Details: Overdispersion: Rescaling the Covariance Matrix" section of the LOGISTIC documentation, the Pearson and deviance statistics require replication within the subpopulations in order to be valid. If there is suitable replication, then the similarity of the two statistics suggests they are providing a reasonable test of fit and their significance could be due to overdispersion or an incorrectly specified model.  You might want to try adding complexity to the model (interactions, quadratic terms, splines, etc.) as seems reasonable to try to achieve a correctly specified model. If these statistics are still significant, then you might have a problem with overdispersion.  The second note mentioned above discusses this. 

Francios
Calcite | Level 5
Hello

I thank you very much for the detail comments. I will look at this and if I have questions, I will get back to you.

Best,

Francios!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1985 views
  • 0 likes
  • 4 in conversation