Programming the statistical procedures from SAS

Estimation of AUC with proc logistic

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Estimation of AUC with proc logistic

Dear everyone,

Now I'm trying to estimate the AUC of ROC with Proc Logistic.

However the c-statistic estimated without using ROC statement differs from the AUC estimated with using ROC statement.

Where does the difference come from?

Are those estimates caluculated using different technics? 

I'd really appreciate it if anyone would help me.

Thanks in advance.

Yasu

/*-Sample Code-*/

data roc;
   input alb tp totscore popind @@;
   totscore = 10 - totscore;
   datalines;
3.0 5.8 10 0   3.2 6.3  5 1   3.9 6.8  3 1   2.8 4.8  6 0  
3.2 5.8  3 1   0.9 4.0  5 0   2.5 5.7  8 0   1.6 5.6  5 1  
3.8 5.7  5 1   3.7 6.7  6 1   3.2 5.4  4 1   3.8 6.6  6 1  
4.1 6.6  5 1   3.6 5.7  5 1   4.3 7.0  4 1   3.6 6.7  4 0  
2.3 4.4  6 1   4.2 7.6  4 0   4.0 6.6  6 0   3.5 5.8  6 1  
3.8 6.8  7 1   3.0 4.7  8 0   4.5 7.4  5 1   3.7 7.4  5 1  
3.1 6.6  6 1   4.1 8.2  6 1   4.3 7.0  5 1   4.3 6.5  4 1  
3.2 5.1  5 1   2.6 4.7  6 1   3.3 6.8  6 0   1.7 4.0  7 0  
3.7 6.1  5 1   3.3 6.3  7 1   4.2 7.7  6 1   3.5 6.2  5 1  
2.9 5.7  9 0   2.1 4.8  7 1   2.8 6.2  8 0   4.0 7.0  7 1  
3.3 5.7  6 1   3.7 6.9  5 1   3.6 6.6  5 1  
;
run;

/*PRG1*/
proc logistic data=roc;
   model popind(event='0') = alb tp;
run;

/*PRG2*/
proc logistic data=roc;
   model popind(event='0') = alb tp;
   roc 'two' alb tp;
run;


Accepted Solutions
Solution
‎02-27-2012 03:02 PM
SAS Super FREQ
Posts: 3,546

Re: Estimation of AUC with proc logistic

I have been summoned....

Interesting. I wasn't aware of this behavior before, but from some experimentation I discovered that you can get PGM1 to give the same answer as PGM2 if you invoke ODS GRAPHICS ON and ask for PLOTS(ONLY)=ROCPLOT.  This tell us that creating a ROC plot is changing the AUC algorithm somehow, and that the ROC statement changes the algorithm in the same way. By scanning the LOGISTIC doc, I narrowed in on either the ROCEPS= or BINWIDTH= option as the option that is causing the change.

Further experimentation reveals that it is the BINWIDTH= option that is causing the difference in the two PROC runs.  You can read about the reasons in the doc or in this USAGE NOTE, but I'll summarize what I think is happening in a few sentences: In the "old days" PROC LOGISTIC used an estimate of the AUC that corresponds to BINWIDTH=0.002. When ODS graphics came around and the ROC statement was added, computational power had increased that it was feasible to compute an exact AUC, so BINWIDTH=0 is used for these new features. But because SAS doesn't like to make changes that might affect someone's benchmarks/existing programs, the default value of 0.002 was retained for the case when ODS graphics are off and you don't use the ROC statement.

So the answer is: PGM1 is an estimate, PGM2 is exact, and you can get PGM1 to give the exact answer by putting BINWIDTH=0 on the MODEL stmt.

Rick

Message was edited by: Rick Wicklin

View solution in original post


All Replies
Respected Advisor
Posts: 2,655

Estimation of AUC with proc logistic

It is not just the ROC, or c value.  All of the measures of association change when the ROC statement is included--all are slightly smaller in this case.  I could not find a reference in the documentation to explain why this is the case.  Hopefully someone (say Rick) will drop in and give us some clues as to what is going on.

Steve Denham

Solution
‎02-27-2012 03:02 PM
SAS Super FREQ
Posts: 3,546

Re: Estimation of AUC with proc logistic

I have been summoned....

Interesting. I wasn't aware of this behavior before, but from some experimentation I discovered that you can get PGM1 to give the same answer as PGM2 if you invoke ODS GRAPHICS ON and ask for PLOTS(ONLY)=ROCPLOT.  This tell us that creating a ROC plot is changing the AUC algorithm somehow, and that the ROC statement changes the algorithm in the same way. By scanning the LOGISTIC doc, I narrowed in on either the ROCEPS= or BINWIDTH= option as the option that is causing the change.

Further experimentation reveals that it is the BINWIDTH= option that is causing the difference in the two PROC runs.  You can read about the reasons in the doc or in this USAGE NOTE, but I'll summarize what I think is happening in a few sentences: In the "old days" PROC LOGISTIC used an estimate of the AUC that corresponds to BINWIDTH=0.002. When ODS graphics came around and the ROC statement was added, computational power had increased that it was feasible to compute an exact AUC, so BINWIDTH=0 is used for these new features. But because SAS doesn't like to make changes that might affect someone's benchmarks/existing programs, the default value of 0.002 was retained for the case when ODS graphics are off and you don't use the ROC statement.

So the answer is: PGM1 is an estimate, PGM2 is exact, and you can get PGM1 to give the exact answer by putting BINWIDTH=0 on the MODEL stmt.

Rick

Message was edited by: Rick Wicklin

Respected Advisor
Posts: 2,655

Estimation of AUC with proc logistic

Now that is what I call an answer! Thanks Rick. I hope the OP puts this in the Correct Answer category.

I had a dime on ROCEPS, and would have lost.

Any other PROCs affected by this change with ODS GRAPHICS ON?

Steve Denham

SAS Super FREQ
Posts: 3,546

Re: Estimation of AUC with proc logistic

I was mistaken when I originally said you get different answers if you turn on ODS GRAPHICS. It's actually asking for the ROCPLOT that causes the different behavior. I have inserted the clause "and ask for PLOTS(ONLY)=ROCPLOT" into my original answer.

I don't know of any other procedures that give different answers when a graph is requested, but I didn't know about this one until today.  You'll can check with Technical Support if you want an "official" answer to that question.  My answers are all unofficial and shouldn't be given any more weight than you'd give to anyone else.

SAS Super FREQ
Posts: 3,546

Estimation of AUC with proc logistic

By the way,  I determined which PROC run was correct by outputting the ROC curve by using the OUTROC= option and then using the trapezoidal rule to integrate the AUC.  You can read about this process in my article "A statistical application of numerical integration: The area under an ROC curve." You can also use SAS/IML to compute the ROC curve from basic principles.  It's a good way to understand what PROC LOGISTIC is doing "under the hood."

Contributor
Posts: 25

Estimation of AUC with proc logistic

Dear Mr.Rick and Mr.Steve,

I really appreciate your kind and fast response!

I've also conducted some experiments and reached  the same conclusion.

I'll check with SAS tech suport about this issue.

When I get some information, I'll share it on this thread.

Thank you for your continued support.

Regards,

Yasu

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 3054 views
  • 0 likes
  • 3 in conversation