BookmarkSubscribeRSS Feed
djbateman
Lapis Lazuli | Level 10

I have three 2x2 contingency tables where each tables displays the agreement counts between 2 readers on a dichotomous test.  I have been asked to average the agreement rates, which I can easily do.  But then I was asked to put a confidence interval around the values.  The average agreement would simply be (0.76+0.67+0.80)/3 = 0.7433.  Does anyone know how I could average the confidence interval?  Can I simply average the lower bounds and upper bounds to get a new confidence interval?  Can I average the cell counts from the 3 tables and make a new table from which I can generate the Clopper-Pearson CI?  (I have attempted this, but my agreement rate didn't come out quite the same, so I abandoned that whole plan).

Test 1
Reader 2
Reader 1PositiveNegative
Positive2311
Negative114

Agreement Rate = 76%

95% CI = (0.6113, 0.8666)

Test 2
Reader 2
Reader 1PositiveNegative
Positive2113
Negative312

Agreement Rate = 67%

95% CI = (0.5246, 0.8005)

Test 3
Reader 2
Reader 1PositiveNegative
Positive195
Negative520

Agreement Rate = 80%

95% CI = (0.6566, 0.8976)

5 REPLIES 5
SteveDenham
Jade | Level 19

How about using kappa as the measure of agreement?  PROC FREQ will then give CI's on both the individual tables and the overall value.

data one;
input test reader1 $ reader2 $ weight;
datalines;
1 P P 23
1 P N 11
1 N P 1
1 N N 14
2 P P 21
2 P N 13
2 N P 3
2 N N 12
3 P P 19
3 P N 5
3 N P 5
3 N N 20
;

proc freq data=one;
tables test*reader1*reader2;
weight weight;
test agree;
run;

I know that kappa is NOT the same as the agreement parameter you calculated, but it is widely used as a measure of rater (test) agreement, and has better statistical properties.

Steve Denham

djbateman
Lapis Lazuli | Level 10

I have done kappa statistics (as well as PABAK), but my CEO is more into observed agreement rates since they are more easily understandable to our clinicians than kappa statistics.  We do report both, but we are still interested in averaging the values.  I didn't specify (which I should done), but these are pairwise agreement rates.  There were only 3 readers that read each of the 49 subjects.  Test 1 is Reader 1 vs. Reader 2; Test 2 is Reader 1 vs. Reader 3; and Test 3 is Reader 2 vs. Reader 3.  Can you suggest a better methodology than averaging them?  We have done a 2-reader agreement and a 3-reader agreement (the number of times in which 2 readers and 3 readers make the same call, respectively).  We have also done a method where we randomly selected 2 of the 3 readers for each subject, and then ran our agreement statistics based on that outcome.

SteveDenham
Jade | Level 19

I think there is a technical term for this, but it boils down to !&%$**!##.  That just sucks about "more easily understandable".

Three way agreement is tough.  Maybe a generalized linear model where reader is a repeated factor on each of the 49 subjects, and then calculating an ICC, but I am definitely starting to feel outside my comfort zone on this.

Steve Denham

djbateman
Lapis Lazuli | Level 10

I have been using Fleiss' kappa along with ICC (they appear to be nearly identical) when dealing with more than 2 readers.  I think what it comes down to is this: we are trying to decide what the probability is that any 2 randomly selected readers (from a pool of trained readers) will agree on a patient's status--positive or negative.  This can be used to explain to the clinicians, but it will also help us in planning a study that will validate our diagnostic for the FDA.

SteveDenham
Jade | Level 19

That is an interesting problem.  Reader as a random effect...

How interpretable would the following be?

proc glimmix;

class reader subjid;

model response=/dist=binomial;

random intercept reader/subject=subjid;

run;

and then calculating the ICC from the variance components.  Not quite it--I think you'll need variance due to each reader, so maybe

random intercept/subject=subjid group=reader;

would be better.

Steve Denham

Message was edited by: Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2024 views
  • 0 likes
  • 2 in conversation