Disease variable is implicit in the patient-level data that I have consisted of true disease cases (N=2,770). The variable negative is only available in the 'test' data and test data can be linked to the other datasets by the 'date' variable that they all share. the reason why province data has "confirmed" is that these separate and linkable datasets are inclusive of confirmed cases that can be found in the patient-level data, I realize. Yes, I can see now that I don't have to do three way or four way strata.
As you pointed out: if you know all these information...
I'm staring at the data I have vs data I may not have required to calculate test specificity.
I calculated 1. Total N of test (N=395,194); 2. Total N of negative (N=372,002); 3. Total N of positive results (N=9,661, labelled as confirmed) compared to Total N of patients (true positive N=2,770) in the patient-level data. I just noted that the variable 'confirmed' in aggregated data sets are described as: Total N of positive results in the data dictionary. Which makes sense since it's few times higher than true positive cases N=2,770 which is the endpoints of the patient-level data. There is an inconsistency between these numbers. For example, the sum of 9,661 and 372,002 don't add up to 395,194 as shown in the screen-shot. Probably, because of missing data which I have no way of tracing down, I think.
Thanks for brainstorming with me and your questions. They helped.
Please let me know if any thoughts on what I shared here in the text and the screenshot below.
... View more