turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Compare ROC curves ignoring missing predictors

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-29-2016 10:12 AM

I'm trying to compare AUC for two ROC curves. But I have missing data for one of the predictors, and I want to ignore the missing values (instead of throwing out those records).

I know if I put the predictors in the model, the records will be excluded by LOGISTIC. So I thought perhaps the ROC statement PRED= specification would be my answer, but unfortunately it throws an error when it encounters a mising value:

```
data have;
input x1 x2 y;
cards;
1 1 0
2 2 1
3 . 0
4 2 0
5 1 1
;
run;
proc logistic data=have plots(only)=roc;
model Y(event='1') = ;
roc 'x1' pred=x1;
roc 'x2' pred=x2; *Throws error improper missing;
run;
```

Is there an easy way to get SAS to compare these two curves? (Other than running two PROCs and saving the output data etc).

I had thought transforming the data might help:

```
data have;
input group x y;
cards;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
2 1 0
2 2 1
2 2 0
2 1 1
;
run;
```

That would make it easy to get two ROC curves with a BY-statement, but I still can't see a way to get one chart with both curves, and an AUC comparison.

I realize simply ignoring missing values is not always the best approach, but curious if there is a way to do so here.

If not, I suppose I can run PRC LOGISTIC with BY-statement, output the statistics and other results, than plot the curves myself.

Thanks.

Accepted Solutions

Solution

03-29-2016
02:20 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

03-29-2016 02:05 PM

See Usage Note *45339: *Comparing the areas under independent ROC curves

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

03-29-2016 10:32 AM

Any observation has a missing value (appearing as . when printed) in the X1 or X2 variable, then PROC LOGISTIC immediately halts and issues the message that you got. In this case, adding a WHERE statement to filter out observations with missing values should allow the procedure to run. For example -

**proc** **logistic** data=have plots(only)=roc;

model Y(event='1') = ;

roc 'x1' pred=x1;

roc 'x2' pred=x2; *Throws error improper missing;

WHERE X2 ~=**.**;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cici0017

03-29-2016 10:47 AM

Thanks @cici0017, but my hope was to include all 5 records when generating the ROC curve for X1, and include 4 records when generating the ROC curve for x2.

So if it were a t-test, I want to do a two-sample t-test, not a paired t-test. I suppose I want a two-sample comparison of the two ROC curves.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

03-29-2016 11:47 AM

I don't know anything about this, but 2-sample implies to me that CLASS might be useful.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

03-29-2016 12:35 PM

Do you want to fit two models to the same data set with different predictors and get a comparative ROC graph? You need use the NOFIT option and list all the variables on the MODEL statement. For example -

**proc** **logistic** data=have plots(only)=roc rocoptions(id=prob);

model Y(event='1') = x1 x2/nofit outroc=roc;

roc 'x1' x1 ;

roc 'x2' x2 ;

**run**;

**proc** **print** data = roc;**run**;

ROC statement automatically generates overlayed ROC curves for you.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cici0017

03-29-2016 01:14 PM

Yes @cici0017 that is the sort of chart I want. But note that for one record the value of X1 is missing.

The logistic output notes this:

Number of Observations Read 5 Number of Observations Used 4

As I understand it that means only 4 obs were used for of the ROC curve of X1 and the ROC curve of X2.

My goal was to make the same plot you made (and ideally get a test on difference in AUC), but have the ROC curve of X1 use 4 obs but the ROC curve of X2 use all 5 obs that have data.

Solution

03-29-2016
02:20 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

03-29-2016 02:05 PM

See Usage Note *45339: *Comparing the areas under independent ROC curves

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cici0017

03-29-2016 02:24 PM

Thanks much @cici0017. That note is very helpful, and confirms that in order to compare two independent ROC curves I need to run PROC LOGISTIC twice, save the output data from each, and then overlay the charts myself (and compute the test statistic to compare them). Bummer, but not the end of the world. I guess it's the price I pay for missing data. : )