turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Estimation of AUC with proc logistic

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2012 05:23 AM

Dear everyone,

Now I'm trying to estimate the AUC of ROC with Proc Logistic.

However the c-statistic estimated without using ROC statement differs from the AUC estimated with using ROC statement.

Where does the difference come from?

Are those estimates caluculated using different technics?

I'd really appreciate it if anyone would help me.

Thanks in advance.

Yasu

/*-Sample Code-*/

data roc;

input alb tp totscore popind @@;

totscore = 10 - totscore;

datalines;

3.0 5.8 10 0 3.2 6.3 5 1 3.9 6.8 3 1 2.8 4.8 6 0

3.2 5.8 3 1 0.9 4.0 5 0 2.5 5.7 8 0 1.6 5.6 5 1

3.8 5.7 5 1 3.7 6.7 6 1 3.2 5.4 4 1 3.8 6.6 6 1

4.1 6.6 5 1 3.6 5.7 5 1 4.3 7.0 4 1 3.6 6.7 4 0

2.3 4.4 6 1 4.2 7.6 4 0 4.0 6.6 6 0 3.5 5.8 6 1

3.8 6.8 7 1 3.0 4.7 8 0 4.5 7.4 5 1 3.7 7.4 5 1

3.1 6.6 6 1 4.1 8.2 6 1 4.3 7.0 5 1 4.3 6.5 4 1

3.2 5.1 5 1 2.6 4.7 6 1 3.3 6.8 6 0 1.7 4.0 7 0

3.7 6.1 5 1 3.3 6.3 7 1 4.2 7.7 6 1 3.5 6.2 5 1

2.9 5.7 9 0 2.1 4.8 7 1 2.8 6.2 8 0 4.0 7.0 7 1

3.3 5.7 6 1 3.7 6.9 5 1 3.6 6.6 5 1

;

run;

/*PRG1*/

proc logistic data=roc;

model popind(event='0') = alb tp;

run;

/*PRG2*/

proc logistic data=roc;

model popind(event='0') = alb tp;

roc 'two' alb tp;

run;

Accepted Solutions

Solution

02-27-2012
03:02 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-27-2012 03:02 PM

I have been summoned....

Interesting. I wasn't aware of this behavior before, but from some experimentation I discovered that you can get PGM1 to give the same answer as PGM2 if you invoke ODS GRAPHICS ON and ask for PLOTS(ONLY)=ROCPLOT. This tell us that creating a ROC plot is changing the AUC algorithm somehow, and that the ROC statement changes the algorithm in the same way. By scanning the LOGISTIC doc, I narrowed in on either the ROCEPS= or BINWIDTH= option as the option that is causing the change.

Further experimentation reveals that it is the BINWIDTH= option that is causing the difference in the two PROC runs. You can read about the reasons in the doc or in this USAGE NOTE, but I'll summarize what I think is happening in a few sentences: In the "old days" PROC LOGISTIC used an *estimate *of the AUC that corresponds to BINWIDTH=0.002. When ODS graphics came around and the ROC statement was added, computational power had increased that it was feasible to compute an exact AUC, so BINWIDTH=0 is used for these new features. But because SAS doesn't like to make changes that might affect someone's benchmarks/existing programs, the default value of 0.002 was retained for the case when ODS graphics are off and you don't use the ROC statement.

So the answer is: PGM1 is an estimate, PGM2 is exact, and you can get PGM1 to give the exact answer by putting BINWIDTH=0 on the MODEL stmt.

Rick

Message was edited by: Rick Wicklin

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-27-2012 12:57 PM

It is not just the ROC, or c value. All of the measures of association change when the ROC statement is included--all are slightly smaller in this case. I could not find a reference in the documentation to explain why this is the case. Hopefully someone (say Rick) will drop in and give us some clues as to what is going on.

Steve Denham

Solution

02-27-2012
03:02 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-27-2012 03:02 PM

I have been summoned....

Interesting. I wasn't aware of this behavior before, but from some experimentation I discovered that you can get PGM1 to give the same answer as PGM2 if you invoke ODS GRAPHICS ON and ask for PLOTS(ONLY)=ROCPLOT. This tell us that creating a ROC plot is changing the AUC algorithm somehow, and that the ROC statement changes the algorithm in the same way. By scanning the LOGISTIC doc, I narrowed in on either the ROCEPS= or BINWIDTH= option as the option that is causing the change.

Further experimentation reveals that it is the BINWIDTH= option that is causing the difference in the two PROC runs. You can read about the reasons in the doc or in this USAGE NOTE, but I'll summarize what I think is happening in a few sentences: In the "old days" PROC LOGISTIC used an *estimate *of the AUC that corresponds to BINWIDTH=0.002. When ODS graphics came around and the ROC statement was added, computational power had increased that it was feasible to compute an exact AUC, so BINWIDTH=0 is used for these new features. But because SAS doesn't like to make changes that might affect someone's benchmarks/existing programs, the default value of 0.002 was retained for the case when ODS graphics are off and you don't use the ROC statement.

So the answer is: PGM1 is an estimate, PGM2 is exact, and you can get PGM1 to give the exact answer by putting BINWIDTH=0 on the MODEL stmt.

Rick

Message was edited by: Rick Wicklin

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-27-2012 03:47 PM

Now that is what I call an answer! Thanks Rick. I hope the OP puts this in the Correct Answer category.

I had a dime on ROCEPS, and would have lost.

Any other PROCs affected by this change with ODS GRAPHICS ON?

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-27-2012 04:05 PM

I was mistaken when I originally said you get different answers if you turn on ODS GRAPHICS. It's actually asking for the ROCPLOT that causes the different behavior. I have inserted the clause "and ask for PLOTS(ONLY)=ROCPLOT" into my original answer.

I don't know of any other procedures that give different answers when a graph is requested, but I didn't know about this one until today. You'll can check with Technical Support if you want an "official" answer to that question. My answers are all unofficial and shouldn't be given any more weight than you'd give to anyone else.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-27-2012 03:12 PM

By the way, I determined which PROC run was correct by outputting the ROC curve by using the OUTROC= option and then using the trapezoidal rule to integrate the AUC. You can read about this process in my article "A statistical application of numerical integration: The area under an ROC curve." You can also use SAS/IML to compute the ROC curve from basic principles. It's a good way to understand what PROC LOGISTIC is doing "under the hood."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-29-2012 07:35 PM

Dear Mr.Rick and Mr.Steve,

I really appreciate your kind and fast response!

I've also conducted some experiments and reached the same conclusion.

I'll check with SAS tech suport about this issue.

When I get some information, I'll share it on this thread.

Thank you for your continued support.

Regards,

Yasu