Solved: Re: ROC doesn't improve between GLIMMIX models, despite significant pr...

NMB82 · Posted 04-18-2021 06:56 PM

I'm building random intercept models (null, level-1, level-2, and combined) and the ROC curve is ~.97 for all of these models, though the fit indices (AIC, -2LL) do improve with each model. I'm using various patient and encounter-level characteristics to predict restraint use in the emergency department. The outcome is binary and all predictors are binary or categorical. Hospital admissions (~n=32,000) are nested within patients (~n=19,000). Can someone help me understand why, for example the ROC curve for the below null model is almost the same as the full model following it? I'm using the predicted probabilities from the model output to get the ROC in Proc Logistic, as covered here (here). Most of the predictors are statistically significant.

proc glimmix data=data method=quad empirical=classical;
nloptions gconv=0 tech=nrridg;
 class PatID ;
model Restraint (descending)=  / CL DIST=BINARY LINK=LOGIT SOLUTION;
random intercept / subject=PatID;
output out=null_out pred=xbeta pred(ilink)=predprob; 
run;
/*****/
proc logistic data=null_out;
 model Restraint(descending)= predprob/ nofit; 
 roc 'GLIMMIX null model' pred=predprob ;
 run;

proc glimmix data=data  METHOD=quad empirical=classical ;
nloptions gconv=0 tech=nrridg;
CLASS 
    Var1 Var2 Var3 Var4 Var5 ;
model Restraint(descending)= 
   Var1 Var2 Var3 Var4 Var5
    / CL DIST=BINARY LINK=LOGIT SOLUTION ODDSRATIO ;
random intercept / subject=PatID ;
output out=full_out pred=xbeta pred(ilink)=predprob; 
run;

/* ROC*/
 proc logistic data=full_out;
 model Restraint(descending)= predprob/ nofit; 
 roc 'GLIMMIX full model' pred=predprob ;
 run;

StatDave · Posted 04-20-2021 11:24 AM

The precision (positive predictive value or PPV) and the recall (sensitivity) are both available from the CTABLE option in PROC LOGISTIC. You can save that table using an ODS OUTPUT statement and then create the precision-recall plot. Using the example for the GLIMMIX model in this note, the following statements create the plot. The reference line is at the overall observed event rate (155/503).

proc logistic data=glmmout;
   model sideeffect/n = predprob / ctable; 
   ods output classification=ctable;
   run;
proc sgplot data=ctable noautolegend aspect=1;
   xaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05; 
   yaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05;
   refline 31;
   series y=ppv x=sensitivity;
   title "Precision-Recall Curve";
   run;

View solution in original post

NMB82 · Posted 04-19-2021 10:17 AM

PaigeMiller · Posted 04-19-2021 11:25 AM

I am struggling to understand how a null model with no predictors can get an area under the curve of 0.97. I assume that because you use a random intercept, that the different patients in the random intercept are producing such a high area under the curve.

And so the other models don't improve the area under the curve because the best predictor is patient ID is the best predictor by far, even though the additional variables might be statistically significant.

Either that or you have such a very small percentage of values of Restraint that are 1 and a very huge percentage of values of Restraint that are 0, and so in that case, the null model ought to fit well.

--
Paige Miller

NMB82 · Posted 04-19-2021 11:55 AM

Thank you...I was thinking the same regarding the 1st part....that the biggest predictor of restraint use would simply be the patient. As far as your 2nd point, this is somewhat of a rare event, the outcome is binary with ~8% of encounters resulting in a restraint. This being the case, is the ROC even the best measure of predictive ability for such a model? Is there a better measure I should try?

STAT_Kathleen · Posted 04-19-2021 06:13 PM

The following SGF paper has brief discussion (pages 8-9) on ROC curves for binary GLIMMIX models that might be helpful to you.

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2179-2018.pdf

NMB82 · Posted 04-20-2021 10:17 AM

Thank you! Is there a guide to calculate the precision-recall curve? I don't see this available in Proc Logistic. Or, at least, a confusion matrix?

StatDave · Posted 04-20-2021 11:24 AM

The precision (positive predictive value or PPV) and the recall (sensitivity) are both available from the CTABLE option in PROC LOGISTIC. You can save that table using an ODS OUTPUT statement and then create the precision-recall plot. Using the example for the GLIMMIX model in this note, the following statements create the plot. The reference line is at the overall observed event rate (155/503).

proc logistic data=glmmout;
   model sideeffect/n = predprob / ctable; 
   ods output classification=ctable;
   run;
proc sgplot data=ctable noautolegend aspect=1;
   xaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05; 
   yaxis values=(0 to 100 by 25) grid offsetmin=.05 offsetmax=.05;
   refline 31;
   series y=ppv x=sensitivity;
   title "Precision-Recall Curve";
   run;

NMB82 · Posted 04-20-2021 12:42 PM

Thank you for this! I'm unclear where the "(155/503)" comes from, as it's not in the link provided. I assume this means the event rate (event/total observations)? Also, the variable ppv is not in the ctable output. I believe this refers to positive predicted value or precision...how would that be calculated with the variables in the ctable output are below?

StatDave · Posted 04-20-2021 12:53 PM

If you run the example in the note, you will see that the count of events is 155 and the total count is 503. So, the overall event rate is 155/503. The PPV is in the CTABLE option output and is the Percentage column labeled Pos Pred. If you don't see that column, then you probably need to upgrade your SAS release to the current SAS 9.4 TS1M7.

NMB82 · Posted 04-20-2021 01:00 PM

Ok, that makes sense. I'm not seeing that variable, so I'll likely need to upgrade or figure out a different approach. It's a work computer, so I've gotta hassle IT for an upgrade...ugh.

StatDave · Posted 04-20-2021 01:03 PM

You can easily compute the PPV from two other columns in the table: ppv = (correct event) / (correct event+incorrect event) .

ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit measures

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m

Re: ROC doesn't improve between GLIMMIX models, despite significant predictors & improving fit m