BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Melk
Lapis Lazuli | Level 10

I am running a 5 fold cross validation on my logistic model. The predicted probability of the outcome for each test group was generated using data from the training groups. My question is - how can I draw a "mean" ROC curve to summarize my results?

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

If you have a data set of actual binary responses and predicted probabilities (from whatever source), you can produce an ROC plot and analysis using the PRED= option in the ROC statement of PROC LOGISTIC as shown in this note.  Notice that the predicted probabilities in each of the examples come from other modeling methods than PROC LOGISTIC.

View solution in original post

6 REPLIES 6
Reeza
Super User

Are you asking how to do the calculation or how to create the graph?

Melk
Lapis Lazuli | Level 10

How to create the graph, but it would also be useful to calculate the AUC. I essentially have output predicted probabilities from fitting my training data on each of my test data, I just dont know how to use this to create the ROC curve, ultimately get the AUC.

Ksharp
Super User

What do you mean by 'average' ROC .

You can use ROC= option to get those K-fold  roc dataset,

but you might not get the average roc, x could be different, you could try.

if you want plot all these 5 roc in the same graph. here is an example .

 

 

proc logistic data=test9 ;
model good_bad(event='good')= &varlist 
/outroc=_roc lackfit scale=none aggregate rsquare firth;
output out=_output h=h c=c cbar=cbar;
run;

data plot_roc;
 set roc(in=ina) _roc;
 if ina then dsn='训练集';
  else dsn='测试集';
run;

title ' ';
proc sgplot data=plot_roc  aspect=1;
 series x=_1MSPEC_ y=_SENSIT_ /group=dsn smoothconnect name='x';
 lineparm x=0 y=0 slope=1/lineattrs=(color=verylightgray);
 xaxis grid;
 yaxis grid;
 keylegend 'x' /title=' ' location=inside position=nw across=1;
run;
Melk
Lapis Lazuli | Level 10

Well, I thought that since the training data was used to fit the test data, which is a different portion of the main dataset each time, those predicted probabilities could be used to get an "average" ROC (by basically combining each of the test datasets, each with predicted probabilities of the outcome, together). So, I basically have a dataset with my outcome, predictors, and predicted probabilities from my k fold CV and dont know how to construct the ROC curve.

StatDave
SAS Super FREQ

If you have a data set of actual binary responses and predicted probabilities (from whatever source), you can produce an ROC plot and analysis using the PRED= option in the ROC statement of PROC LOGISTIC as shown in this note.  Notice that the predicted probabilities in each of the examples come from other modeling methods than PROC LOGISTIC.

Melk
Lapis Lazuli | Level 10

Thanks so much!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2707 views
  • 2 likes
  • 4 in conversation