Hello,
I am looking at the accuracy of a new biomarker, say X (continouous) in diagnosing diabetes (Yes or no) using logistic regression. I am using age (continouous), sex (binary), treatment (binary), BMI (continouous) as covariates in the model I am using the highest Youden index to select a cut-off for this biomarker:
For e.g.,
Obs | cutoff | prob | Sensitivity | Specificity | Youden |
1 | 14.2085 | 0.097430 | 0.87550 | 0.88062 | 0.75612 |
Is there any way to estimate confidence intervals for the cutoff and the highest youden index? Any help is appreciated.
It's not clear what the issue is without seeing your data.
As a minimum it might help to share the code you are using to get the index values.
Thank you for your reply. Here is the sas code I have used:
************ROC curve, fully adjusted model for age sex and bmi;
*btain intercept and slope using the below command;
ods graphics on;
proc logistic data = CAT ;
TITLE 'ROC curve of X';
model Diabetes_120_(event='1') = X age sex BMI / lackfit rsquare outroc=rocdata2;
output out=pred predicted=pred;
roc "X";
run;
ods graphics off;
*Calculate a rational cut-off point in ROC curve analyses;
*using logit=intercept+slope(X), where X is cutoff or cutoff=(logit+intercept)/slope;
*Here intercept is -13.5972 and slope is 0.8003;
data CAT3(keep=cutoff prob Sensitivity Specificity
Youden);
set rocdata2;
logit=log(_prob_/(1-_prob_));*calculate logit;
cutoff=(logit+13.5972)/0.8003; *calculate cutoff;
prob= _prob_; *calculate cutoff;
Sensitivity = _SENSIT_; *calculate sensitivity;
Specificity = 1-_1MSPEC_; *calculate specificity;
Youden= _SENSIT_+ (1-_1MSPEC_)-1; *calculate Youden index;
run;
*sort data CAT3 by descending Youden index;
Proc sort data=CAT3 ;
by descending Youden ;
run;
Proc print data=CAT3 (firstobs= 1 obs= 10);
TITLE 'First ten values of Youden index';
Run;
A review of the literature and a computational method is available in the section "Computing algorithm" of the following paper:
Lai CY, Tian L, Schisterman EF. Exact confidence interval estimation for the Youden index and its corresponding optimal cut-point. Comput Stat Data Anal. 2010;56(5):1103-1114.
Thank you. Is there any way it can be done in the SAS becasue I was not able to open to go through the example as the link to the example does not open.
I don't think this corresponds to the paper that Rick referred to, but one way to get a confidence interval on the optimal cutpoint on the predictor is to fit the model in PROC PROBIT and use the INVERSECL option. In your example, the following gives a confidence interval around the optimal X cutoff when you replace "youden-prob-level" with the predicted probability associated with your Youden-optimal cutpoint.
proc probit data = CAT inversecl(prob= youden-prob-level);
model Diabetes_120_(event='1') = X age sex BMI / d=logistic;
run;
Hello,
Thank you. I do get a confidenc interval for the cut-off using the sas code above. However, there is some problem.
************ROC curve, fully adjusted model for age sex and bmi;
*obtain intercept and slope using the below command;
ods graphics on;
proc logistic data = CAT ;
model Diabetes_120_(event='1') = X age sex BMI / lackfit rsquare outroc=rocdata2;
output out=pred2 predicted=pred2;
roc "X";
run;
ods graphics off;
*Calculate a rational cut-off point in ROC curve analyses;
*using logit=intercept+slope(X), where X is cutoff or cutoff=(logit+intercept)/slope;
*Here intercept is -13.5972 and slope is 0.8003;
data CAT3(keep=cutoff prob Sensitivity Specificity
Youden);
set rocdata2;
logit=log(_prob_/(1-_prob_));*calculate logit;
cutoff=(logit+13.5972)/0.8003; *calculate cutoff;
prob= _prob_; *calculate cutoff;
Sensitivity = _SENSIT_; *calculate sensitivity;
Specificity = 1-_1MSPEC_; *calculate specificity;
Youden= _SENSIT_+ (1-_1MSPEC_)-1; *calculate Youden index;
run;
*sort data CAT3 by descending Youden index;
Proc sort data=CAT3 ;
by descending Youden ;
run;
Proc print data=CAT3 (firstobs= 1 obs= 10);
TITLE 'First ten values of Youden index';
Run;
I get this as ouput from the above command
cutoff PROB Sensitivity Specificity Youden1
14.2085 | 0.097430 | 0.87550 | 0.88062 | 0.75612 |
When I use the command below:
*confidence intervals for youden index and cutoff;
proc probit data = CAT inversecl(prob= 0.097430);
model Diabetes_120_(event='1') = VAR8 age sex VAR5 / d=logistic;
run;
X 95% Fiducial Limits
11.5330 | 11.2543 | 11.7997 |
The cut-off I am getting here is different than what I got above, i.e., 14.2085. Any idea as to why is it so?
That is probably because you have not fit the exact same model in PROBIT as you did in LOGISTIC. The variable names are not the same. Be sure to verify that the fitted parameters from PROBIT match those from LOGISTIC. Note also that the INVERSECL option gives confidence intervals for the first predictor listed in the MODEL statement. That should be your X variable.
Hello,
Thank you for your reply. Actually, both are the same variables. Is the difference in cut-off there because in the logistic regression model other than X, age, sex, and BMI.
*confidence intervals for youden index and cutoff;
proc probit data = CAT inversecl(prob= 0.097430);
model Diabetes_120_(event='1') = X age sex VAR5 / d=logistic;
run;
X 95% Fiducial Limits
11.5330 | 11.2543 | 11.7997 |
However, for calculating a cut-off using outroc data, I am only using intercept and slope and have not accounted for age, sex, and BMI. If so, how can I account these variable while calculting the cut-off?
*Calculate a rational cut-off point in ROC curve analyses;
*using logit=intercept+slope(X), where X is cutoff or cutoff=(logit+intercept)/slope;
*Here intercept is -13.5972 and slope is 0.8003;
data CAT3(keep=cutoff prob Sensitivity Specificity
Youden);
set rocdata2;
logit=log(_prob_/(1-_prob_));*calculate logit;
cutoff=(logit+13.5972)/0.8003; *calculate cutoff;
prob= _prob_; *calculate cutoff;
Sensitivity = _SENSIT_; *calculate sensitivity;
Specificity = 1-_1MSPEC_; *calculate specificity;
Youden= _SENSIT_+ (1-_1MSPEC_)-1; *calculate Youden index;
run;
There is no need to compute the cutpoint on the predictor (X) scale as you do in your DATA CAT3 step. You just need the cutpoint on the probability scale (which is apparently 0.0974). Using that value, PROC PROBIT provides the cutpoint estimate on the X scale using the full model, along with a confidence interval. So, the estimate and confidence interval you got from PROBIT should be what you want.
Thank you. This sounds good. However, if I use %rocplot macro to obtain a cut-off for the variable X at the highest Youden index, it comes out to be 12.1 instead of 11.5 (obtained from proc probit). The senstivity, specificity, and the highest Youden index are the same using %rocplot macro and proc probit. Only the cut-off of variable x is different. Any idea, why is it so?
The sas code, I used for the %rocplot macro is as below:
ods graphics on;
proc logistic data = CAT ;
class sex (ref='0');
model Diabetes_120_(event='1') = X age sex BMI / lackfit rsquare outroc=rocdata2;
output out=pred2 p=pred2;
roc ;
run;
ods graphics off;
*I am only using variable X so as get a clear picture on the roc plot. However, the inroc data and predicted probabilities are from the adjusted model;
%inc "Path of the %rocplot macro";
%rocplot("9.4", inpred = pred2,inroc = rocdata2, p = pred2,
id = X _OPTY_ _sens_ _spec_ ,optcrit= youden, optsymbolstyle = size=13 color=red weight=bold)
Assuming that the macro reported a unique optimum for the Youden criterion, then verify that the probability value associated with the unique maximum is the value you are specifying in INVERSECL(PROB=). In the Optimal Cutpoints table from the macro, that value is in the Cutpoint column. See Examples 1 and 2 in the ROCPLOT macro documentation.
The probability value associated with the unique optimum for the Youden index is the same I am using in the INVERSECL(PROB=). Even then the value of the cut-off for the variable X is coming out different.
It's not clear what the issue is without seeing your data.
Thank you for your help. There was some issue with the dataset. Your help is highly appreciated.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.