04-30-2013 01:38 PM
I have three 2x2 contingency tables where each tables displays the agreement counts between 2 readers on a dichotomous test. I have been asked to average the agreement rates, which I can easily do. But then I was asked to put a confidence interval around the values. The average agreement would simply be (0.76+0.67+0.80)/3 = 0.7433. Does anyone know how I could average the confidence interval? Can I simply average the lower bounds and upper bounds to get a new confidence interval? Can I average the cell counts from the 3 tables and make a new table from which I can generate the Clopper-Pearson CI? (I have attempted this, but my agreement rate didn't come out quite the same, so I abandoned that whole plan).
Agreement Rate = 76%
95% CI = (0.6113, 0.8666)
Agreement Rate = 67%
95% CI = (0.5246, 0.8005)
Agreement Rate = 80%
95% CI = (0.6566, 0.8976)
05-01-2013 09:47 AM
How about using kappa as the measure of agreement? PROC FREQ will then give CI's on both the individual tables and the overall value.
input test reader1 $ reader2 $ weight;
1 P P 23
1 P N 11
1 N P 1
1 N N 14
2 P P 21
2 P N 13
2 N P 3
2 N N 12
3 P P 19
3 P N 5
3 N P 5
3 N N 20
proc freq data=one;
I know that kappa is NOT the same as the agreement parameter you calculated, but it is widely used as a measure of rater (test) agreement, and has better statistical properties.
05-01-2013 09:55 AM
I have done kappa statistics (as well as PABAK), but my CEO is more into observed agreement rates since they are more easily understandable to our clinicians than kappa statistics. We do report both, but we are still interested in averaging the values. I didn't specify (which I should done), but these are pairwise agreement rates. There were only 3 readers that read each of the 49 subjects. Test 1 is Reader 1 vs. Reader 2; Test 2 is Reader 1 vs. Reader 3; and Test 3 is Reader 2 vs. Reader 3. Can you suggest a better methodology than averaging them? We have done a 2-reader agreement and a 3-reader agreement (the number of times in which 2 readers and 3 readers make the same call, respectively). We have also done a method where we randomly selected 2 of the 3 readers for each subject, and then ran our agreement statistics based on that outcome.
05-01-2013 10:30 AM
I think there is a technical term for this, but it boils down to !&%$**!##. That just sucks about "more easily understandable".
Three way agreement is tough. Maybe a generalized linear model where reader is a repeated factor on each of the 49 subjects, and then calculating an ICC, but I am definitely starting to feel outside my comfort zone on this.
05-01-2013 10:35 AM
I have been using Fleiss' kappa along with ICC (they appear to be nearly identical) when dealing with more than 2 readers. I think what it comes down to is this: we are trying to decide what the probability is that any 2 randomly selected readers (from a pool of trained readers) will agree on a patient's status--positive or negative. This can be used to explain to the clinicians, but it will also help us in planning a study that will validate our diagnostic for the FDA.
05-01-2013 10:40 AM
That is an interesting problem. Reader as a random effect...
How interpretable would the following be?
class reader subjid;
random intercept reader/subject=subjid;
and then calculating the ICC from the variance components. Not quite it--I think you'll need variance due to each reader, so maybe
random intercept/subject=subjid group=reader;
would be better.
Message was edited by: Steve Denham