I am new to using SAS and I need help.
I need to calculate sensitivity and specificity, add them as variables to the dataset, plot AUROC with 3 different curves and label specific values on each curve.
So far, below is the code I used and I managed to create AUROC, but I do not have the values of sensitivity and specificity in my dataset, and I also could not label specific values on the ROC curves.
(The attached dataset is a generated one that is very similar to the dataset I have)
I have a dataset with the following variables (all variables are numerical):
ID: patients' ID number
DV1: dependent variable-1, (values range (60-400))
DV2: dependent variable-2, (values range (60-800))
Resp1: response variable-1, (values range (1-12))
Resp2: response variable-2, (values range (1-7))
First, I created dummy variables for DV1 and DV2 as follows:
DATA data2;
SET data1;
IF DV1 LT 115 THEN c1=0; ELSE c1=1 ;
IF DV2 LT 115 THEN c2=0; ELSE c2=1 ;
RUN;
Then, produce 3 different ROC curves:
ODS GRAPHICS ON;
PROC LOGISTIC DATA= data2;
CLASS c1 ;
MODEL c1= Resp1 / OUTROC= rocdata1;
WHERE c1= 1;
RUN;
PROC GPLOT DATA= rocdata1;
PLOT _sensit_*_1mspec_;
RUN;
QUIT;
PROC LOGISTIC DATA= data2;
CLASS c1;
MODEL c1= Resp2 / OUTROC= rocdata2;
WHERE c1= 1;
RUN;
PROC GPLOT DATA= rocdata2;
PLOT _sensit_*_1mspec_;
RUN;
QUIT;
PROC LOGISTIC DATA= data2;
CLASS c2 ;
MODEL c2= Resp2 / OUTROC= rocdata3;
WHERE c2= 1;
RUN;
PROC GPLOT DATA= rocdata3;
PLOT _sensit_*_1mspec_;
RUN;
QUIT;
I merged the 3 datasets and plotted each ROC as follows:
DATA rocdata2a;
SET rocdata2;
RENAME _sensit_ = _sensit2_ ;
RENAME _1mspec_ = _1mspec2_ ;
RUN;
DATA rocdata3a;
SET rocdata3;
RENAME _sensit_ = _sensit3_ ;
RENAME _1mspec_ = _1mspec3_ ;
RUN;
PROC SORT DATA= rocdata1 ; BY _POS_; RUN;
PROC SORT DATA= rocdata2a ; BY _POS_; RUN;
PROC SORT DATA= rocdata3a ; BY _POS_; RUN;
DATA rocdt ;
MERGE rocdata1 rocdata2a rocdata3a;
BY _POS_ ;
RUN;
PROC GPLOT DATA= rocdt;
PLOT _sensit_*_1mspec_ _sensit2_*_1mspec2_ _sensit3_*_1mspec3_;
RUN;
QUIT;
Finally, I tried combining the 3 ROC curves into one graph:
PROC SGPLOT DATA= rocdt aspect=1;
XAXIS values=(0 to 1 by 0.2);
YAXIS values=(0 to 1 by 0.2);
LINEPARM x=0 y=0 slope=1 / transparency= 0.5 lineattrs=(color=black
pattern=shortdash);
SERIES x=_1mspec_ y=_sensit_ ;
SERIES x=_1mspec2_ y= _sensit2_ ;
SERIES x=_1mspec3_ y= _sensit3_ ;
RUN;
QUIT;
Please help me understand and know what is missing. I searched so many websites and blogs for a similar case to mine but I couldn't find the solution.
I am a beginner at SAS and I don't understand yet complicated codes. I want to perform my codes with confidence and I would really appreciate the help!
-ems3
PROC LOGISTIC should do all the calculations for you and produce the ROC curves with SENSIT and SPECIF, and can save these values to a SAS data set if you really want to plot them yourself, or if you have other uses in mind. Use the option OUTROC= in the MODEL statement. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...
The AUROC is saved in the ROCASSOCIATION table:
ods output rocassociation=rocassociation;
proc logistic ...;
...;
run;
Hi PaigeMiller,
Thanks for helping me trying to solve this matter.
As you mentioned that PROC LOGISTIC does all the calculations of sensitivity and 1-specificity for me and save the values in a separate data set, but the problem I cannot identify the values belongs to the who. The ID of the patients (identifier variable) are not included in the produced data set from OUTROC= and I do need this variable and the original Resp1 and Resp2 values used to calculate the sensitivity and 1-specificity linked for further analyses. I still cannot find a way to solve this issue. Can you help?
I do not have an OUTROC= data set available to me at this moment, but these (I think) are not related to the individual, they are related to the model, and enable you to plot the ROC curve and probably perform other analyses. So there should not be any identifier variable relating to individuals in the study.
If you need predicted probabilities for individuals, rather than the SENSIT and SPECIF for each individual, those come from the OUTPUT statement in PROC LOGISTIC.
There are many issues here. "Dependent" and "Response" mean the same thing, so it is not clear which variable should be the dependent (response) variable and which should be the independent (predictor) variable in a model. Further, all of the variables are continuous. To dichotomize a variable throws away a huge amount of data and you should consider whether this is wise and if another approach that uses all of the data is better. Next, the WHERE statements restrict all of the data seen by PROC LOGISTIC so that all usable observations have the same response and this should cause an error. The response in a binary logistic model should be binary - two distinct values should occur in the response variable being analyzed.
If you provide binary response data to PROC LOGISTIC, the sensitivity and specificity values for each possible event probability cutoff will appear in the OUTROC= data set. PROC LOGISTIC itself will produce the ROC plot. A comparative plots of ROC curves for a given response variable can be done as shown in the example titled "Comparing Receiver Operating Characteristic Curves" in the PROC LOGISTIC documentation. PROC GPLOT is not needed. If you want to label points on the ROC curve with certain values, use the PLOTS(ONLY)=ROC(ID=value) option in the PROC LOGISTIC statement, where value is one of the values discussed in the PLOT= option description in the PROC LOGISTIC statement Syntax section of the PROC LOGISTIC documentation. See the example mentioned above.
Hi StatDave_sas,
Thank you for your comments. I will address each in points as follows:
1. " "Dependent" and "Response" mean the same thing, so it is not clear which variable should be the dependent (response) variable and which should be the independent (predictor) variable in a model."
- Thanks for noticing my error. The variables DV1 and DV2 are predictor_variable_1 (PV1) and predictor_variable_1 (PV2).
2. "all of the variables are continuous. To dichotomize a variable throws away a huge amount of data and you should consider whether this is wise and if another approach that uses all of the data is better."
- I understand the issue involves dichotomizing continuous variable but my research question requires me to take this step to answer a specific question.
3. "the WHERE statements restrict all of the data seen by PROC LOGISTIC so that all usable observations have the same response and this should cause an error. The response in a binary logistic model should be binary - two distinct values should occur in the response variable being analyzed."
- I have revised my code and I think this code should be the correct one:
DATA data2;
SET data1;
IF PV1 LT 115 THEN c1=1; ELSE c1=0 ;
IF PV2 LT 115 THEN c2=1; ELSE c2=0 ;
RUN;
PROC LOGISTIC DATA= data2;
MODEL c1 (EVENT="1") = Resp1 / OUTROC= rocdata1;
RUN;
4. "If you provide binary response data to PROC LOGISTIC, the sensitivity and specificity values for each possible event probability cutoff will appear in the OUTROC= data set."
- Yes, I can find sensitivity and 1-specificity values in the data set produced from the OUTROC=. However, I do not know the ID of each sensitivity or 1-specificity, and I do not know the value of Resp1 for each sensitivity and 1-specificity calculated. I need to know these variables and values for further analyses. How can I include these variables in the OUTROC produced data set?
5. "A comparative plots of ROC curves for a given response variable can be done as shown in the example titled "Comparing Receiver Operating Characteristic Curves" in the PROC LOGISTIC documentation."
- I have read through this documentation, but the problem is I want to create a single ROC graph with three different curves using 2 response variables, 2 predictors and the combination of them (PV1 and Resp1, PV1 and Resp2, and PV2 and Resp2). using MODEL c1= Resp1 Resp2/ OUTROC=rocdata1; ROC 'Response1' resp1; ROC 'Response2' resp2; will not work as I am still lacking the MODEL c2= Resp2 in here. Do you think there is another way to create this graph? I tried using the PROC SGPLOT as you can see from my post.
6. "PROC LOGISTIC itself will produce the ROC plot.", " PROC GPLOT is not needed."
- I removed PROC GPLOT from my code. Thanks
7. "If you want to label points on the ROC curve with certain values, use the PLOTS(ONLY)=ROC(ID=value) option in the PROC LOGISTIC statement, where value is one of the values discussed in the PLOT= option description in the PROC LOGISTIC statement Syntax section of the PROC LOGISTIC documentation."
- This is the problem, SAS does not give me the option to identify the points with a variable from within my data set. For instance, if I use the PLOTS(ONLY)=ROC(ID=value), where value is PROB, it will label the probability level point, or if I choose OBS it will display the observation number, but from these I cannot know the Resp1 variable value (cutpoint) on the graph nor I can track the point myself down from the observation ID (patient's ID in my original data set). How can I get these inputs on the ROC graph or in the ROC data set created along with the sensitivity and 1-specificity?
I really appreciate your time to respond to my inquiries.
As I mentioned, PROC LOGISTIC can produce an ROC plot with labeled points. You are not limited to just using observation numbers or computed statistics in the label. If you specify the PLOTS(ONLY)=ROC(ID=ID) option and include an ID statement in your PROC LOGISTIC step, then you can use any variable (or variables) you like from your data set to label the points. But keep in mind that each point on the ROC curve is determined by all of the predictor values in the model and not just the value of any one predictor if your model has multiple predictors. One consequence of this is that more than one point on the ROC curve could have the same value of any one particular predictor.
To produce a single plot with ROC curves from several models, use the OUTROC= option in each model fit to produce ROC data sets with unique names. Then concatenate them together using a DATA step and add a variable identifying the model each one was from. You can then use SGPLOT to produce an overlaid ROC plot like what PROC LOGISTIC makes with code like the following:
proc sgplot data=allmodels aspect=1;
xaxis values=(0 to 1 by 0.25) grid offsetmin=.05 offsetmax=.05;
yaxis values=(0 to 1 by 0.25) grid offsetmin=.05 offsetmax=.05;
lineparm x=0 y=0 slope=1 / transparency=.7;
series x=_1mspec_ y=_sensit_ / group=model_ID;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.