I have a 5-level ordinal measure taken on each patient at three different times. A second rater scored a small subset of this data and we can compute inter-rater reliability/ies (IRR)--kappa--on this small subset of this data.
Seeking input on how to incorporate the IRR data in the final analysis understanding that only a small number of measurements will have two sets of ratings? Initial thought is to weight different segments of the scale and different time-points by their respective reliabilities--e.g., low end of the scale is more reliable than high end.
Any and all thoughts and refernces are apreciated. Thanks.