Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Compare ROC curves ignoring missing predictors

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-29-2016 10:12 AM
(3696 views)

I'm trying to compare AUC for two ROC curves. But I have missing data for one of the predictors, and I want to ignore the missing values (instead of throwing out those records).

I know if I put the predictors in the model, the records will be excluded by LOGISTIC. So I thought perhaps the ROC statement PRED= specification would be my answer, but unfortunately it throws an error when it encounters a mising value:

```
data have;
input x1 x2 y;
cards;
1 1 0
2 2 1
3 . 0
4 2 0
5 1 1
;
run;
proc logistic data=have plots(only)=roc;
model Y(event='1') = ;
roc 'x1' pred=x1;
roc 'x2' pred=x2; *Throws error improper missing;
run;
```

Is there an easy way to get SAS to compare these two curves? (Other than running two PROCs and saving the output data etc).

I had thought transforming the data might help:

```
data have;
input group x y;
cards;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
2 1 0
2 2 1
2 2 0
2 1 1
;
run;
```

That would make it easy to get two ROC curves with a BY-statement, but I still can't see a way to get one chart with both curves, and an AUC comparison.

I realize simply ignoring missing values is not always the best approach, but curious if there is a way to do so here.

If not, I suppose I can run PRC LOGISTIC with BY-statement, output the statistics and other results, than plot the curves myself.

Thanks.

BASUG is hosting ** free webinars ** Next up: ** Mark Keintz ** presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

- Tags:
- missing_value
- roc

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

See Usage Note *45339: *Comparing the areas under independent ROC curves

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Any observation has a missing value (appearing as . when printed) in the X1 or X2 variable, then PROC LOGISTIC immediately halts and issues the message that you got. In this case, adding a WHERE statement to filter out observations with missing values should allow the procedure to run. For example -

**proc** **logistic** data=have plots(only)=roc;

model Y(event='1') = ;

roc 'x1' pred=x1;

roc 'x2' pred=x2; *Throws error improper missing;

WHERE X2 ~=**.**;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks @cici0017, but my hope was to include all 5 records when generating the ROC curve for X1, and include 4 records when generating the ROC curve for x2.

So if it were a t-test, I want to do a two-sample t-test, not a paired t-test. I suppose I want a two-sample comparison of the two ROC curves.

BASUG is hosting ** free webinars ** Next up: ** Mark Keintz ** presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't know anything about this, but 2-sample implies to me that CLASS might be useful.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you want to fit two models to the same data set with different predictors and get a comparative ROC graph? You need use the NOFIT option and list all the variables on the MODEL statement. For example -

**proc** **logistic** data=have plots(only)=roc rocoptions(id=prob);

model Y(event='1') = x1 x2/nofit outroc=roc;

roc 'x1' x1 ;

roc 'x2' x2 ;

**run**;

**proc** **print** data = roc;**run**;

ROC statement automatically generates overlayed ROC curves for you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes @cici0017 that is the sort of chart I want. But note that for one record the value of X1 is missing.

The logistic output notes this:

Number of Observations Read 5 Number of Observations Used 4

As I understand it that means only 4 obs were used for of the ROC curve of X1 and the ROC curve of X2.

My goal was to make the same plot you made (and ideally get a test on difference in AUC), but have the ROC curve of X1 use 4 obs but the ROC curve of X2 use all 5 obs that have data.

BASUG is hosting ** free webinars ** Next up: ** Mark Keintz ** presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

See Usage Note *45339: *Comparing the areas under independent ROC curves

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks much @cici0017. That note is very helpful, and confirms that in order to compare two independent ROC curves I need to run PROC LOGISTIC twice, save the output data from each, and then overlay the charts myself (and compute the test statistic to compare them). Bummer, but not the end of the world. I guess it's the price I pay for missing data. : )

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.