BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yasu
Fluorite | Level 6

Dear everyone,

Now I'm trying to estimate the AUC of ROC with Proc Logistic.

However the c-statistic estimated without using ROC statement differs from the AUC estimated with using ROC statement.

Where does the difference come from?

Are those estimates caluculated using different technics? 

I'd really appreciate it if anyone would help me.

Thanks in advance.

Yasu

/*-Sample Code-*/

data roc;
   input alb tp totscore popind @@;
   totscore = 10 - totscore;
   datalines;
3.0 5.8 10 0   3.2 6.3  5 1   3.9 6.8  3 1   2.8 4.8  6 0  
3.2 5.8  3 1   0.9 4.0  5 0   2.5 5.7  8 0   1.6 5.6  5 1  
3.8 5.7  5 1   3.7 6.7  6 1   3.2 5.4  4 1   3.8 6.6  6 1  
4.1 6.6  5 1   3.6 5.7  5 1   4.3 7.0  4 1   3.6 6.7  4 0  
2.3 4.4  6 1   4.2 7.6  4 0   4.0 6.6  6 0   3.5 5.8  6 1  
3.8 6.8  7 1   3.0 4.7  8 0   4.5 7.4  5 1   3.7 7.4  5 1  
3.1 6.6  6 1   4.1 8.2  6 1   4.3 7.0  5 1   4.3 6.5  4 1  
3.2 5.1  5 1   2.6 4.7  6 1   3.3 6.8  6 0   1.7 4.0  7 0  
3.7 6.1  5 1   3.3 6.3  7 1   4.2 7.7  6 1   3.5 6.2  5 1  
2.9 5.7  9 0   2.1 4.8  7 1   2.8 6.2  8 0   4.0 7.0  7 1  
3.3 5.7  6 1   3.7 6.9  5 1   3.6 6.6  5 1  
;
run;

/*PRG1*/
proc logistic data=roc;
   model popind(event='0') = alb tp;
run;

/*PRG2*/
proc logistic data=roc;
   model popind(event='0') = alb tp;
   roc 'two' alb tp;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I have been summoned....

Interesting. I wasn't aware of this behavior before, but from some experimentation I discovered that you can get PGM1 to give the same answer as PGM2 if you invoke ODS GRAPHICS ON and ask for PLOTS(ONLY)=ROCPLOT.  This tell us that creating a ROC plot is changing the AUC algorithm somehow, and that the ROC statement changes the algorithm in the same way. By scanning the LOGISTIC doc, I narrowed in on either the ROCEPS= or BINWIDTH= option as the option that is causing the change.

Further experimentation reveals that it is the BINWIDTH= option that is causing the difference in the two PROC runs.  You can read about the reasons in the doc or in this USAGE NOTE, but I'll summarize what I think is happening in a few sentences: In the "old days" PROC LOGISTIC used an estimate of the AUC that corresponds to BINWIDTH=0.002. When ODS graphics came around and the ROC statement was added, computational power had increased that it was feasible to compute an exact AUC, so BINWIDTH=0 is used for these new features. But because SAS doesn't like to make changes that might affect someone's benchmarks/existing programs, the default value of 0.002 was retained for the case when ODS graphics are off and you don't use the ROC statement.

So the answer is: PGM1 is an estimate, PGM2 is exact, and you can get PGM1 to give the exact answer by putting BINWIDTH=0 on the MODEL stmt.

Rick

Message was edited by: Rick Wicklin

View solution in original post

6 REPLIES 6
SteveDenham
Jade | Level 19

It is not just the ROC, or c value.  All of the measures of association change when the ROC statement is included--all are slightly smaller in this case.  I could not find a reference in the documentation to explain why this is the case.  Hopefully someone (say Rick) will drop in and give us some clues as to what is going on.

Steve Denham

Rick_SAS
SAS Super FREQ

I have been summoned....

Interesting. I wasn't aware of this behavior before, but from some experimentation I discovered that you can get PGM1 to give the same answer as PGM2 if you invoke ODS GRAPHICS ON and ask for PLOTS(ONLY)=ROCPLOT.  This tell us that creating a ROC plot is changing the AUC algorithm somehow, and that the ROC statement changes the algorithm in the same way. By scanning the LOGISTIC doc, I narrowed in on either the ROCEPS= or BINWIDTH= option as the option that is causing the change.

Further experimentation reveals that it is the BINWIDTH= option that is causing the difference in the two PROC runs.  You can read about the reasons in the doc or in this USAGE NOTE, but I'll summarize what I think is happening in a few sentences: In the "old days" PROC LOGISTIC used an estimate of the AUC that corresponds to BINWIDTH=0.002. When ODS graphics came around and the ROC statement was added, computational power had increased that it was feasible to compute an exact AUC, so BINWIDTH=0 is used for these new features. But because SAS doesn't like to make changes that might affect someone's benchmarks/existing programs, the default value of 0.002 was retained for the case when ODS graphics are off and you don't use the ROC statement.

So the answer is: PGM1 is an estimate, PGM2 is exact, and you can get PGM1 to give the exact answer by putting BINWIDTH=0 on the MODEL stmt.

Rick

Message was edited by: Rick Wicklin

SteveDenham
Jade | Level 19

Now that is what I call an answer! Thanks Rick. I hope the OP puts this in the Correct Answer category.

I had a dime on ROCEPS, and would have lost.

Any other PROCs affected by this change with ODS GRAPHICS ON?

Steve Denham

Rick_SAS
SAS Super FREQ

I was mistaken when I originally said you get different answers if you turn on ODS GRAPHICS. It's actually asking for the ROCPLOT that causes the different behavior. I have inserted the clause "and ask for PLOTS(ONLY)=ROCPLOT" into my original answer.

I don't know of any other procedures that give different answers when a graph is requested, but I didn't know about this one until today.  You'll can check with Technical Support if you want an "official" answer to that question.  My answers are all unofficial and shouldn't be given any more weight than you'd give to anyone else.

Rick_SAS
SAS Super FREQ

By the way,  I determined which PROC run was correct by outputting the ROC curve by using the OUTROC= option and then using the trapezoidal rule to integrate the AUC.  You can read about this process in my article "A statistical application of numerical integration: The area under an ROC curve." You can also use SAS/IML to compute the ROC curve from basic principles.  It's a good way to understand what PROC LOGISTIC is doing "under the hood."

Yasu
Fluorite | Level 6

Dear Mr.Rick and Mr.Steve,

I really appreciate your kind and fast response!

I've also conducted some experiments and reached  the same conclusion.

I'll check with SAS tech suport about this issue.

When I get some information, I'll share it on this thread.

Thank you for your continued support.

Regards,

Yasu

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 8510 views
  • 0 likes
  • 3 in conversation