BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NMB82
Obsidian | Level 7

I'm building random intercept models (null, level-1, level-2, and combined) and the ROC curve is ~.97 for all of these models, though the fit indices (AIC, -2LL) do improve with each model. I'm using various patient and encounter-level characteristics to predict restraint use in the emergency department. The outcome is binary and all predictors are binary or categorical. Hospital admissions (~n=32,000) are nested within patients (~n=19,000). Can someone help me understand why, for example the ROC curve for the below null model is almost the same as the full model following it? I'm using the predicted probabilities from the model output to get the ROC in Proc Logistic, as covered here (here). Most of the predictors are statistically significant.

 

proc glimmix data=data method=quad empirical=classical;
nloptions gconv=0 tech=nrridg;
 class PatID ;
model Restraint (descending)=  / CL DIST=BINARY LINK=LOGIT SOLUTION;
random intercept / subject=PatID;
output out=null_out pred=xbeta pred(ilink)=predprob; 
run;
/*****/
proc logistic data=null_out;
 model Restraint(descending)= predprob/ nofit; 
 roc 'GLIMMIX null model' pred=predprob ;
 run;
proc glimmix data=data  METHOD=quad empirical=classical ;
nloptions gconv=0 tech=nrridg;
CLASS 
    Var1 Var2 Var3 Var4 Var5 ;
model Restraint(descending)= 
   Var1 Var2 Var3 Var4 Var5
    / CL DIST=BINARY LINK=LOGIT SOLUTION ODDSRATIO ;
random intercept / subject=PatID ;
output out=full_out pred=xbeta pred(ilink)=predprob; 
run;

/* ROC*/
 proc logistic data=full_out;
 model Restraint(descending)= predprob/ nofit; 
 roc 'GLIMMIX full model' pred=predprob ;
 run;
1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The precision (positive predictive value or PPV) and the recall (sensitivity) are both available from the CTABLE option in PROC LOGISTIC. You can save that table using an ODS OUTPUT statement and then create the precision-recall plot. Using the example for the GLIMMIX model in this note, the following statements create the plot. The reference line is at the overall observed event rate (155/503).

proc logistic data=glmmout;
   model sideeffect/n = predprob / ctable; 
   ods output classification=ctable;
   run;
proc sgplot data=ctable noautolegend aspect=1;
   xaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05; 
   yaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05;
   refline 31;
   series y=ppv x=sensitivity;
   title "Precision-Recall Curve";
   run;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

I am struggling to understand how a null model with no predictors can get an area under the curve of 0.97. I assume that because you use a random intercept, that the different patients in the random intercept are producing such a high area under the curve. 

 

And so the other models don't improve the area under the curve because the best predictor is patient ID is the best predictor by far, even though the additional variables might be statistically significant.


Either that or you have such a very small percentage of values of Restraint that are 1 and a very huge percentage of values of Restraint that are 0, and so in that case, the null model ought to fit well.

--
Paige Miller
NMB82
Obsidian | Level 7

Thank you...I was thinking the same regarding the 1st part....that the biggest predictor of restraint use would simply be the patient. As far as your 2nd point, this is somewhat of a rare event, the outcome is binary with ~8% of encounters resulting in a restraint. This being the case, is the ROC even the best measure of predictive ability for such a model? Is there a better measure I should try? 

STAT_Kathleen
SAS Employee

The following SGF paper has brief discussion (pages 8-9) on ROC curves for binary GLIMMIX models that might be helpful to you.

 

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2179-2018.pdf

 

 

NMB82
Obsidian | Level 7

Thank you! Is there a guide to calculate the precision-recall curve?  I don't see this available in Proc Logistic. Or, at least, a confusion matrix?

StatDave
SAS Super FREQ

The precision (positive predictive value or PPV) and the recall (sensitivity) are both available from the CTABLE option in PROC LOGISTIC. You can save that table using an ODS OUTPUT statement and then create the precision-recall plot. Using the example for the GLIMMIX model in this note, the following statements create the plot. The reference line is at the overall observed event rate (155/503).

proc logistic data=glmmout;
   model sideeffect/n = predprob / ctable; 
   ods output classification=ctable;
   run;
proc sgplot data=ctable noautolegend aspect=1;
   xaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05; 
   yaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05;
   refline 31;
   series y=ppv x=sensitivity;
   title "Precision-Recall Curve";
   run;
NMB82
Obsidian | Level 7

Thank you for this! I'm unclear where the "(155/503)" comes from, as it's not in the link provided. I assume this means the event rate (event/total observations)? Also, the variable ppv is not in the ctable output. I believe this refers to positive predicted value or precision...how would that be calculated with the variables in the ctable output are below?ctable.jpg

StatDave
SAS Super FREQ

If you run the example in the note, you will see that the count of events is 155 and the total count is 503. So, the overall event rate is 155/503. The PPV is in the CTABLE option output and is the Percentage column labeled Pos Pred. If you don't see that column, then you probably need to upgrade your SAS release to the current SAS 9.4 TS1M7. 

NMB82
Obsidian | Level 7

Ok, that makes sense. I'm not seeing that variable, so I'll likely need to upgrade or figure out a different approach. It's a work computer, so I've gotta hassle IT for an upgrade...ugh.

StatDave
SAS Super FREQ
You can easily compute the PPV from two other columns in the table: ppv = (correct event) / (correct event+incorrect event) .

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 828 views
  • 7 likes
  • 4 in conversation