BookmarkSubscribeRSS Feed
Francios
Calcite | Level 5
 

Hello

I am trying to predict high school students success of being admitted to university based on the number of years in high school (i.e. either three or four years). Number of years in high school is dichotomized 1 = three years and 2= four years.

 

The response variable is ordered based on1 to 5, with 5 being the highest level of success to be admitted to a university.

 

I also have another varable called School Type attended which is dichotomized as top tier = 1 and lower tier = 2. I also have gender, male and female. The sample size is over 56000 representing 24 schools, which were randomly selected from 400 schools. So this is what I have:

 

Predictor = number of years (coded 1 and 0)

Outcome = Admissibility to university (coded 1 – 5)

Gender = (female and Male, coded 1 and 2)

School Type (A and B, coded 1 and 2)

 

I run Ordinal Logistics see SAS code below - output attached :

 

ODS LISTING CLOSE;

  ods graphics on;

  ODS RTF FILE = '\\Client\C$\SHS_DATA_CURRENT\LOGIT_YEARS.RTF';

   proc logistic data=SHS plots(only)=(effect(polybar)oddsratio(range=clip))DESCENDING;

      class YEARS(param=ref ref"4YEARS");

           WHERE YEARS NE ('3N4YEARS');

      model ACCEPT=YEARS / SCALE=NONE AGGREGATE covb;

      oddsratio YEARS ;

           ODDSRATIO ACCEPT; 

           OUTPUT OUT=PREDICTED2 PRED=PRED;

      title PREDICTING STUDENT ADMISSIBILITY TO UNIV. BASED ON YEARS IN HIGH SCHOOL;

   run;

   ods rtf close;

   ods graphics off;

   ods listing;

 

The Problem I am having is the Proportional odds assumption is not held, Deviance and Pearson are both significant ---please see my print out.  I have also tried other suggested techniques such as empirical test of parallelism of my variables – see sample code below and output on attachment. Since the empirical test lines suggest parallelism, shoud I continue with the analysis? Is there anything that anyone will suggest I do to make sure that I am doing the right thing?

 

proc freq data=SHS;

     table ACCEPT*YEARS / out=os;

WHERE NOT MISSING(ACCEPT);

     run;

 

 PROC SORT DATA = OS; 

BY YEARS;

RUN;

   proc transpose data=os(WHERE=(YEARS NE '3N4YEARS')) out=tran;

     by YEARS; var count;

     run;

 

   data a; set tran;

     const=0;

     c1=log((sum(of col1-col1)+const)/(sum(of col2-col5)+const));

     c2=log((sum(of col1-col2)+const)/(sum(of col3-col5)+const));

     c3=log((sum(of col1-col3)+const)/(sum(of col4-col5)+const));

     c4=log((sum(of col1-col4)+const)/(sum(of col5-col5)+const));

   run;

 

   ODS RTF FILE = '\\Client\C$\SHS_DATA_CURRENT\LOGIT_YEARS.RTF';

   TITLE 'EMPIRACAL PLOTS OF ACCEPT ON YEARS';

   proc sgplot;

     series y=c1 x=YEARS; 

     series y=c2 x=YEARS;

     series y=c3 x=YEARS;

     series y=c4 x=YEARS;

     yaxis values=(-6 to 6);

     xaxis integer;

     run;

 

My question what do I do next. Is this the end of my analysis. Should I continue interpreting my results based on the fact that the lines are parallel?

 

Any help will  be appreciated.

1 REPLY 1
StatDave
SAS Super FREQ

Statistical tests become more powerful at detecting small effects as the sample size increases.  So, it is possible that these tests are detecting trivially small departures from fit for your practical purposes.  You might want to assess how well the model does at the observation level by seeing how well it classifies observations. The graphical method you used, based on this note, is also a good way to assess the proportional odds assumption without using a test. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1199 views
  • 0 likes
  • 2 in conversation