Hi, I'm using SAS to match cases/controls on conditions (in this case sex at birth and 10-year age range, which I already defined) for a matched case/control study on comorbidities and COVID. I have about 300 cases and 2000 controls in my dataset, and ideally I would like to match 1-3 controls to each case without reusing controls. I use proc sql to match, and end up with a table showing every possible match - every permutation, where there's a separate line for every match. About 20,000 lines. However, a single control may be matched to 100 different cases. So it might look like this: case_id control_id 1 10 1 125 1 1850 1 928 3 125 3 8 3 1276 3 1011 3 10 4 10 4 1011 I'd like to limit it to only use each control once, but to still allow multiple matches per case - and not to delete any cases unless they have ZERO matches. But, when I try to deduplicate, I end up with the vast majority of my cases getting deleted. Here's my code for matching: proc sql; create table controls_id as select one.uniqueid as case_id, two.uniqueid as control_id, one.age_range as case_agegroup, two.age_range as control_agegroup, one.sex_at_birth as case_sab, two.sex_at_birth as control_sab from case one, control two where (one.age_range=two.age_range and one.sex_at_birth = two.sex_at_birth); And here's my code for deduplicating by control_ID - when I run this, it deletes all instances of the duplicated controls, but (I think because it's deleting randomly, and every single control matches more than one case) takes most of my cases with it. proc sort data=controls_id nodupkey; by control_id; run; I would like to delete observations with a duplicate control value, but prioritize deletion so that I lose as few cases as possible. Only one case does not have a matching control and each remaining case matches with more than one control. Would appreciate any help - thank you!!
... View more