BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sharonlee
Quartz | Level 8

Hi,

How do I calculate confidence intervals for sensitivity/specificity for correlated data? 

My original plan was as follows: my response variable is binary and I have multiple data points per person, which is why I want to use GEE (proc genmod) to calculated the predicted probabilities, use proc logistic and the ctable option to get the classification table and then use proc freq with the option senspec to get the CI for sensitivity/specificity.  I have read this post (https://support.sas.com/kb/24/170.html) that helped me understand how to get CIs using proc freq. 

HOWEVER, I'm worried that proc freq will not take into consideration the correlated nature of my data which will have a direct impact on my confidence intervals.  My point estimates will remain the same whether I assume independence or correlated data, but it's the confidence intervals that will be different.

I also saw this old 2016 posting https://communities.sas.com/t5/Statistical-Procedures/Sensitivity-and-Specificity-Confidence-Interva... where someone had the same question as me, but no resolution.  I'm wondering if there has been any developments since then?

I'm using SAS v9.4.

Is there a macro that I can use to calculate confidence intervals for sensitivity/specificity for correlated data?

Thank you in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
sharonlee
Quartz | Level 8

Hi,

I found this very informative article on using GEE  to calculate sensitivity/specificity/NPV/PPV and their confidence intervals.  The authors also provide example SAS code which I used and it works beautifully.  

The full citation is:

 

Invest Ophthalmol Vis Sci 2020 Sep 1;61(11):29. doi: 10.1167/iovs.61.11.29.
Calculating Sensitivity, Specificity, and Predictive Values for Correlated Eye Data
Gui-Shuang Ying, Maureen G Maguire, Robert J Glynn, Bernard Rosner

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500131/

 

Thank you everyone for your input.

 

View solution in original post

13 REPLIES 13
sharonlee
Quartz | Level 8

Hi @JosvanderVelden,

Thanks for responding.  Unfortunately I am using proc genmod for GEE and this paper doesn't address how to calculate CI using this procedure.

Thanks.

Watts
SAS Employee

You might look at PROC SURVEYFREQ, which does categorical analysis for correlated (clustered) data. The SENSPEC option in the TABLES statement computes sensitivity and specificity.

sharonlee
Quartz | Level 8

Hi @Watts,

Thanks for replying.

Unfortunately PROC SURVEYFREQ doesn't take into account the correlated nature of the data.

 

SteveDenham
Jade | Level 19

You don't mention how your data are correlated.  I suspect, because of your use of GEE, that you are concerned about correlation in time.  Well, PROC SURVEYFREQ does handle correlated data, via the CLUSTER statement.  The correlation between clusters appears to be unstructured (I can't guarantee this, though).  So, if you are using SAS/STAT 15.1 or higher, try using PROC SURVEYFREQ, clustering on your time variable, and using the SENSPEC option in the TABLES statement.

 

SteveDenham

sharonlee
Quartz | Level 8

Hi @SteveDenham,

I checked out PROC SURVEYFREQ and there is a CLUSTER option.  When I looked up with SAS documentation on PROC SURVEYFREQ, it doesn't allow to specify the correlation structure.  This was a good tip, though!  Thank you for sharing with me.

My data are correlated because I have multiple data points per person.

The hunt continues.

If anyone else has ideas, please let me know.

Thank you kindly.

SteveDenham
Jade | Level 19

Hi @sharonlee 

 

You mention that you have multiple observations per subject.  That is like a two-level survey structure, where the observations are clustered within subject.  I believe in that case, the survey procedures treat the observations as nested (roughly equivalent to a RANDOM effect in GLIMMIX or MIXED).

 

See if the SGF2020 paper here or the video associated with it here are helpful.  From that, you could get predicted classifications, and set up a 2x2 table vs the observed values.  Then you could use PROC FREQ to get the intervals in question.  At least, it seems like that might be an approach worth exploring.

 

SteveDenham

sharonlee
Quartz | Level 8

Hi @SteveDenham,

Those are great links!   

What I'm really interested in is to get confidence intervals.  I could get the observed values using PROC SURVEYFREQ and use them as inputs to PROC FREQ, but will PROC FREQ adequately reflect the correlated nature of the data when calculating the confidence intervals?  Do I need to use another equation?

Thank you.

SteveDenham
Jade | Level 19

My suggestion is to use an adaptation of PROC GLIMMIX to account for the correlated errors within subject.  Then you could use an OUTPUT statement in GLIMMIX to get the predicted values on the original data scale.  Something like this  

 

output out=out1  pred=predicted(blup ilink);

The dataset out1 would then have the observed and predicted values for each observation. 

Next, choose a cutpoint (say 0.5) and convert all of the predicted values to 0's and 1's based on the cutpoint.

This could be used in PROC FREQ to get the confidence interval for specificity and sensitivity at the given cutpoint.  You might want to repeat this at a variety of cutpoints to see what effect this may have, so what you would be doing is creating an ROC (see PROC LOGISTIC for examples).  Of course, this all assumes that your data are binomial.  If they are multinomial, it gets harder but the same can be done.  You just have cutpoints for each category.

 

SteveDenham

 

 

sharonlee
Quartz | Level 8

Hi @SteveDenham,

Thanks for continuing this conversation!

Yes, I was going to do exactly what you suggested but using proc genmod and GEE.  However, my concern is that the confidence intervals won't reflect the correlated nature of my data - which are binomial.

Using PROC FREQ assumes that the data are independent, but my data are correlated.

It's unclear just how correlated my data are (ie if using PROC FREQ, assuming independence, will give similar results had I used an approach for correlated data), but I would like to account for the correlation in my analysis. 

Thanks again.

 

SteveDenham
Jade | Level 19

Well, once you get predicted values based on the correlated nature, the PROC FREQ crosstab for agreement between observed and predicted is not so difficult.  I think your concern is that the standard error generated by PROC FREQ for these confidence bounds may be artificially shrunk due to the correlation.  The asymptotic standard error is used with Z=1.96 to get the asymptotic limits, so we would want to stay away from those.  Instead, I think the exact option will give you what you are looking for.  The predicted values already accommodate the correlated nature of the data, so we get confidence bounds based on the correlated nature of the predicted values.  Since the observed values are the "gold standard" in this case, they can be considered to be measured without error or correlation, and so it all works (at least in my mind).  See this note for more (including some NLMIXED code that could be modified with a RANDOM statement).

 

SteveDenham

sharonlee
Quartz | Level 8

Hi @SteveDenham,

Thank you for giving this more thought!

I did read the note you referenced and agree that I can put in a RANDOM statement for PROC NLMIXED.  My concern is, as you articulate, that the PROC FREQ CI may be biased (ie artificially narrower).  Your point about the predicted values being taken into account in the point estimate makes sense.  Perhaps using the EXACT option in PROC FREQ is the way to go.  I'm not sure if there is an article I could reference that using EXACT CI methodology is appropriate for correlated data?  I understand that this may be beyond the scope of this SAS forum, but I think that analysts, whenever possible, should try to have a theoretical foundation on which to base their analysis.

Thanks again for all your help.

sharonlee
Quartz | Level 8

Hi,

I found this very informative article on using GEE  to calculate sensitivity/specificity/NPV/PPV and their confidence intervals.  The authors also provide example SAS code which I used and it works beautifully.  

The full citation is:

 

Invest Ophthalmol Vis Sci 2020 Sep 1;61(11):29. doi: 10.1167/iovs.61.11.29.
Calculating Sensitivity, Specificity, and Predictive Values for Correlated Eye Data
Gui-Shuang Ying, Maureen G Maguire, Robert J Glynn, Bernard Rosner

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500131/

 

Thank you everyone for your input.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2789 views
  • 3 likes
  • 4 in conversation