BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I'm trying to compare AUC for two ROC curves.  But I have missing data for one of the predictors, and I want to ignore the missing values (instead of throwing out those records).

 

I know if I put the predictors in the model, the records will be excluded by LOGISTIC.  So I thought perhaps the ROC statement PRED= specification would be my answer, but unfortunately it throws an error when it encounters a mising value:

 

data have;
  input x1 x2 y;
  cards;
1 1 0
2 2 1
3 . 0
4 2 0
5 1 1
;
run;

proc logistic data=have plots(only)=roc;
  model Y(event='1') = ;
  roc 'x1' pred=x1; 
  roc 'x2' pred=x2; *Throws error improper missing;
run;

 

Is there an easy way to get SAS to compare these two curves?  (Other than running two PROCs and saving the output data etc).

 

I had thought transforming the data might help:

data have;
  input group x y;
  cards;
1 1 0
1 2 1
1 3 0
1 4 0
1 5 1
2 1 0
2 2 1
2 2 0
2 1 1
;
run;

 

That would make it easy to get two ROC curves with a BY-statement, but I still can't see a way to get one chart with both curves, and an AUC comparison.

 

I realize simply ignoring missing values is not always the best approach, but curious if there is a way to do so here. 

 

If not, I suppose I can run PRC LOGISTIC with BY-statement, output the statistics and other results, than plot the curves myself.

 

Thanks.

 

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
1 ACCEPTED SOLUTION

Accepted Solutions
cici0017
SAS Employee

See Usage Note 45339: Comparing the areas under independent ROC curves

http://support.sas.com/kb/45/339.html

View solution in original post

7 REPLIES 7
cici0017
SAS Employee

Any observation has a missing value (appearing as . when printed) in the X1 or X2 variable, then PROC LOGISTIC immediately halts and issues the message that you got.  In this case, adding a WHERE statement to filter out observations with missing values should allow the procedure to run. For example -

 

proc logistic data=have plots(only)=roc;

model Y(event='1') = ;

roc 'x1' pred=x1;

roc 'x2' pred=x2; *Throws error improper missing;

WHERE X2 ~=.;

run;

Quentin
Super User

Thanks @cici0017, but my hope was to include all 5 records when generating the ROC curve for X1, and include 4 records when generating the ROC curve for x2.   

 

So if it were a t-test, I want to do a two-sample t-test, not a paired t-test.  I suppose I want a two-sample comparison of the two ROC curves.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
data_null__
Jade | Level 19

I don't know anything about this, but 2-sample implies to me that CLASS might be useful.

cici0017
SAS Employee

Do you want to fit two models to the same data set with different predictors and get a comparative ROC graph? You need use the NOFIT option and list all the variables on the MODEL statement. For example -

 

proc logistic data=have plots(only)=roc rocoptions(id=prob);

model Y(event='1') = x1 x2/nofit outroc=roc;

roc 'x1' x1 ;

roc 'x2' x2 ;

run;

proc print data = roc;run;

 

ROC statement automatically generates overlayed ROC curves for you.

 

ROCOverlay22.png

 

 
Quentin
Super User

Yes @cici0017 that is the sort of chart I want.  But note that for one record the value of X1 is missing. 

 

The logistic output notes this:

 

Number of Observations Read 5
Number of Observations Used 4

 

As I understand it that means only 4 obs were used for of the ROC curve of X1 and the ROC curve of X2.

 

My goal was to make the same plot you made (and ideally get a test on difference in AUC), but have the ROC curve of X1 use 4 obs but the ROC curve of X2 use all 5 obs that have data.

 

 

 

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
cici0017
SAS Employee

See Usage Note 45339: Comparing the areas under independent ROC curves

http://support.sas.com/kb/45/339.html

Quentin
Super User

Thanks much @cici0017.  That note is very helpful, and confirms that in order to compare two independent ROC curves I need to run PROC LOGISTIC twice, save the output data from each, and then overlay the charts myself (and compute the test statistic to compare them).  Bummer, but not the end of the world.  I guess it's the price I pay for missing data.  : )

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 3691 views
  • 0 likes
  • 3 in conversation