Hi all, I have two data sets where multiple raters have rated multiple x-rays into 3 categories. All raters have rated all x-rays and there is no missing data. I've calculated overall kappas using magree for each of the two data sets. Is anyone aware of a way to statistically compare these two kappas? The macro produces a standard error for the overall kappa, which is very tiny (presumably because there are >200 raters in each group). Would it be proper to use this SE to create confidence intervals? For some reason the macro doesn't create confidence intervals, which makes me concerned I'm incorrect in doing this. Are there other ways to create these confidence intervals? I've seen one example of bootstrapping to create CIs in this situation as well - not sure which is most appropriate. Thanks!
... View more