03-29-2016 10:12 AM
I'm trying to compare AUC for two ROC curves. But I have missing data for one of the predictors, and I want to ignore the missing values (instead of throwing out those records).
I know if I put the predictors in the model, the records will be excluded by LOGISTIC. So I thought perhaps the ROC statement PRED= specification would be my answer, but unfortunately it throws an error when it encounters a mising value:
data have; input x1 x2 y; cards; 1 1 0 2 2 1 3 . 0 4 2 0 5 1 1 ; run; proc logistic data=have plots(only)=roc; model Y(event='1') = ; roc 'x1' pred=x1; roc 'x2' pred=x2; *Throws error improper missing; run;
Is there an easy way to get SAS to compare these two curves? (Other than running two PROCs and saving the output data etc).
I had thought transforming the data might help:
data have; input group x y; cards; 1 1 0 1 2 1 1 3 0 1 4 0 1 5 1 2 1 0 2 2 1 2 2 0 2 1 1 ; run;
That would make it easy to get two ROC curves with a BY-statement, but I still can't see a way to get one chart with both curves, and an AUC comparison.
I realize simply ignoring missing values is not always the best approach, but curious if there is a way to do so here.
If not, I suppose I can run PRC LOGISTIC with BY-statement, output the statistics and other results, than plot the curves myself.
03-29-2016 10:32 AM
Any observation has a missing value (appearing as . when printed) in the X1 or X2 variable, then PROC LOGISTIC immediately halts and issues the message that you got. In this case, adding a WHERE statement to filter out observations with missing values should allow the procedure to run. For example -
proc logistic data=have plots(only)=roc;
model Y(event='1') = ;
roc 'x1' pred=x1;
roc 'x2' pred=x2; *Throws error improper missing;
WHERE X2 ~=.;
03-29-2016 10:47 AM
Thanks @cici0017, but my hope was to include all 5 records when generating the ROC curve for X1, and include 4 records when generating the ROC curve for x2.
So if it were a t-test, I want to do a two-sample t-test, not a paired t-test. I suppose I want a two-sample comparison of the two ROC curves.
03-29-2016 12:35 PM
Do you want to fit two models to the same data set with different predictors and get a comparative ROC graph? You need use the NOFIT option and list all the variables on the MODEL statement. For example -
proc logistic data=have plots(only)=roc rocoptions(id=prob);
model Y(event='1') = x1 x2/nofit outroc=roc;
roc 'x1' x1 ;
roc 'x2' x2 ;
proc print data = roc;run;
ROC statement automatically generates overlayed ROC curves for you.
03-29-2016 01:14 PM
Yes @cici0017 that is the sort of chart I want. But note that for one record the value of X1 is missing.
The logistic output notes this:
Number of Observations Read 5 Number of Observations Used 4
As I understand it that means only 4 obs were used for of the ROC curve of X1 and the ROC curve of X2.
My goal was to make the same plot you made (and ideally get a test on difference in AUC), but have the ROC curve of X1 use 4 obs but the ROC curve of X2 use all 5 obs that have data.
03-29-2016 02:24 PM
Thanks much @cici0017. That note is very helpful, and confirms that in order to compare two independent ROC curves I need to run PROC LOGISTIC twice, save the output data from each, and then overlay the charts myself (and compute the test statistic to compare them). Bummer, but not the end of the world. I guess it's the price I pay for missing data. : )
Need further help from the community? Please ask a new question.