Hi everybody
does anyone can help me in bootstrapping a cut off value with the youden index method?
I can bootstrap AUC with 95%CI but I cannot repeat the same with cut off values (as well as sensitivity and specificity).
tks in advance
a
I don't know if @Rick_SAS has an interest on it .
Show what you've done so far.
Hi Rick_SAS
I've resampled my dataset with PROC SURVEYSELECT,
then
%ODSOff
proc logistic data=BootOut;
by Replicate;
model Var1(event='1')=predictor;
roc 'predictor' predictor;
roccontrast;
ods output Rocassociation=BootrocLVEF;
run;
%ODSOn
;
/*extracting only 'predictor' bootstrapped values and excluding 'model' from ads output*/
data BootrocLVEF;
set BootrocLVEF;
where ROCModel='predictor';
run;
proc univariate data=rocdataLVEFB2_2 noprint;
var Area;
output out=WidePctls pctlpre=P_ pctlpts=2.5 97.5 mean=Mean Std=Std;
run;
proc print data=WidePctls noobs label;
format Mean Std P_2_5 P_97_5 6.4;
label Mean="BootMean" Std="BootStdErr" P_2_5="95% Lower CL" P_97_5="95% Upper CL";
run;
So far, I have boostrapped AUC with 95%CI.
I also would like to find a cut-off value for this predictor.
In the original analysis, I run
proc logistic data=lucia.lucia plots=none;
model Var1(event='1')=predictor /OUTROC=rocdataLVEF;
roc 'LVEF' VS_LVEF;
roccontrast;
run;
/*I see the intercept and predictor coefficient and report in the following code*/
data rocdata2LVEF(keep=cutoff prob Sensitivity Specificity Youden);
set rocdataLVEF;
logit=log(_prob_/(1-_prob_));
cutoff=(logit+14.0995)/0.2727;
prob=_prob_;
Sensitivity=_SENSIT_;
Specificity=1-_1MSPEC_;
Youden=_SENSIT_+ (1-_1MSPEC_)-1;
run;
proc sort data=rocdata2LVEF;
by descending Youden;
run;
proc print data=rocdata2LVEF;
title 'cut off - LVEF';
run;
However, this is done on the original dataset but not after bootstrapping.
Tks in advance
I see. Based on your program, it looks like you have already seen the bootstrap analysis presented in the article "Discrimination, accuracy, and stability in binary classifiers"? Would it be possible to ask your question using the data in that article so that we all have access to the data?
Hi @Rick_SAS
sorry for the delay I'm writing.
Briefly, referring to the article You suggested to use, I've considered only the first dataset (roc) for simplicity.
So.... this is what I've done up to now
/*estimating a cut-off value for alb*/
proc logistic data=roc plots=roc;
model popind(event='0')=alb / outroc=roc_alb;
roc 'alb' alb;
output out=out_alb p=pred;
run;
/*Youden index*/
data roc_alb2(keep=cutoff prob Sensitivity Specificity Youden);
set roc_alb;
logit=log(_prob_/(1-_prob_));
cutoff=(logit-2.4646)/-1.0520;
prob=_prob_;
Sensitivity=_SENSIT_;
Specificity=1-_1MSPEC_;
Youden=_SENSIT_+ (1-_1MSPEC_)-1;
run;
proc sort data=roc_alb2;
by descending Youden;
run;
proc print data=roc_alb2;
title 'cut off - alb';
run;
/*95%CI of cut-off*/
proc probit data=roc inversecl(prob=0.33370);
model popind(event='0')=alb / d=logistic;
title 'cut-off with 95% CI - alb';
run;
/*95%CI of Sensitivity and Specificity*/
data out_alb;
set out_alb;
if pred>0.33 then pop_alb=1;
else pop_alb=0;
run;
title 'Sensitivity - pop_alb';
proc freq data=out_alb;
where popind=0;
tables pop_alb / binomial(level="1");
exact binomial;
run;
title 'Specificity - pop_alb';
proc freq data=out_alb;
where popind=1;
tables pop_alb / binomial(level="0");
exact binomial;
run;
/*likelihood ratio albumin*/
proc genmod data=out_alb descending;
class popind pop_alb;
model pop_alb=popind / dist=binomial link=identity noint;
store genfit;
run;
data fd;
length label f $32767;
infile datalines delimiter=',';
input label f;
datalines;
LR+, b_p1/b_p2
LR-, (1-b_p1)/(1-b_p2)
;
title 'Likelihood Ratio - pop_alb';
%NLEST(instore=genfit, fdata=fd, df=10);
I've also attached the results I've found
So I finally come to my question: I'm wondering how to repeat these analyses after bootstrapping.
Tks again
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.