- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
I am comparing the ROC-AUC of two algorithms in one dataset. I obtained the first ROC-AUC value (0.7003) after running the proc logistic statement using the base model of the algorithm. I output the estimated probabilities in a new dataset and then use this dataset to run the second proc logistic statement and obtain the ROC-AUC value (0.7031) on the enhanced model of the algorithm.
The goal is to overlay both ROC curves. When I run the next proc logistic statement the ROC-AUC value from the base model changes (from 0.7003 to 0.698), while the enhanced model ROC-AUC value remains the same (0.7031).
I have included my code below for the three proc logistic statements. I was wondering if someone could point out the error in my code. I am not sure why the ROC-AUC value changes in the base model when the last set of code overlays the two ROC curves.
Thank you so much for your help. Please let me know if there is any other information I can provide.
*ROC curve for base model ;
proc logistic data=ROCmerge plots(only)=roc ;
where stroke_time <= 10 ;
model instroke(event='1') = BASESTROKEMODELRISKPCT / outroc=Baseroc ;
output out=BASEpredict pred=BASEPROB;
run;
*ROC curve for enhanced model ;
proc logistic data=BASEpredict plots(only)=roc ;
where stroke_time <= 10 ;
model instroke(event="1") = FULLSTROKEMODELRISKPCT / outroc=Fullroc ;
output out=FULLpredict pred=FULLPROB ;
run;
*Overlay ROC curves ;
proc logistic data=FULLpredict ;
where stroke_time <= 10 ;
model instroke(event= "1" ) = BASEPROB FULLPROB / nofit ;
roc 'Base' pred=BASEPROB ;
roc 'Full' pred=FULLPROB ;
roccontrast ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you asking "Why when I change the values / variables used in the model does one of the output statistics change?"
If the goal is to overlay the graph curves and not generate a third model with different data then use ODS OUTPUT to capture the data used to show the graphs.
proc logistic data=ROCmerge plots(only)=roc ; where stroke_time <= 10 ; model instroke(event='1') = BASESTROKEMODELRISKPCT / outroc=Baseroc ; output out=BASEpredict pred=BASEPROB; ods output fitplot=basefitplot; run;
This will create data set with the information used for most of the graph. The variables _xcont1 and _predicted have the x,y coordinates of the basic prediction (ROC) curve.
Create similar data set for the other model.
Combine them adding a variable to identify which model is the source to use as a GROUP variable and use Sgplot to plot the _xcont1 and _predicted series.
data toplot; set basefitplot (in=inbase) fullfitplot (in=infull) ; if inbase the Plot='Base'; else if infull then Plot='Full'; run; proc sgplot data=toplot; series x=_xcont1 y=_predicted /group=plot; run;
The variables _lclm and _uclm could be used with a BAND plot for the Lower and Upper variables as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I tried the ods output fitplot=basefitplot and received this error message "Output 'fitplot' was not created. Make sure that the output object name, label, or path
is spelled correctly. Also, verify that the appropriate procedure options are used to produce the requested output object. For example, verify that the NOPRINT option is not used."
Thank you for looking this over.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is hard to give you some advices if you don't post a real dataset .
And I am unable to replicate your problem. Here is the code I used:
proc logistic data=sashelp.heart(obs=2000);
model status= height /outroc=roc1;
output out=want1 p=pred1;
run;
proc logistic data=want1;
model status= weight /outroc=roc2;
output out=want2 p=pred2;
run;
proc logistic data=want2 ;
model status=pred1 pred2/nofit;
roc 'roc1' pred=pred1;
roc 'roc2' pred=pred2;
run;
To overlay ROC ,check :
https://support.sas.com/kb/52/973.html
https://support.sas.com/kb/45/339.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am attaching an excel sheet with the data. I use: 1) model instroke(event='1') = BASESTROKEMODELRISKPCT to output the first ROC-AUC curve; and 2) model instroke(event="1") = FULLSTROKEMODELRISKPCT to output the second ROC-AUC curve.
I can't get the first ROC-AUC curve to overlay with the second curve, without the first ROC-AUC curve value changing.
I tried the ods output fitplot=basefitplot and received this error message "Output 'fitplot' was not created. Make sure that the output object name, label, or path
is spelled correctly. Also, verify that the appropriate procedure options are used to produce the requested output object. For example, verify that the NOPRINT option is not used."
Thank you for looking this over.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"(from 0.7003 to 0.698)"
I think difference 0.001 or 0.002 is very little tolence, you can ignore it .
And I found an interesting thing is if you polt these two ROC seperatedly one by one ,not overlay, you could get the same result.
Maybe the X value used to calculated ROC are different when overlay these two ROC curves.
proc logistic data=sashelp.heart(obs=2000);
model status= height /outroc=roc1;
output out=want1 p=pred1;
run;
proc logistic data=want1;
model status= weight /outroc=roc2;
output out=want2 p=pred2;
run;
proc logistic data=want2 ; /***<--- For the first ROC*******/
model status=pred1/nofit;
roc 'roc1' pred=pred1;
run;
proc logistic data=want2 ; /***<--- For the second ROC*******/
model status=pred2/nofit;
roc 'roc2' pred=pred2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the suggestion. I will take another look at the data and your code.
I appreciate your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content