10-26-2017 08:20 AM
I have two datasets of individuals, one of 1155 cases and one of 22000 potential controls. I would like to match the cases to the controls on the variable "age" (within a 1-2 year span), but I want each control to only occur once. Having matched all possible combinations, some have as few as 13 matches and some have as many as 2000. If possible, I would for each case to be matched to the same number of unique controls. The selection of controls, where there are many options, should be randomized.
proc sql; create table match as select a.case_id,a.case_age, b.match_id, b.match_age, ranuni(383467663) as rand from cases as a, controls as b where (b.match_age-1)<a.case_age<(b.match_age+1) order by a.case_id; quit;
I've used randomly generated numbers to sort by and delete duplicates among the controls. But I then end up with as few as one control for some cases (and still almost 2000 for others), and seem to lose some cases as well, which I definitely don't want to do.
Thanks in advance!