Hi everyone,
I have two datasets of individuals, one of 1155 cases and one of 22000 potential controls. I would like to match the cases to the controls on the variable "age" (within a 1-2 year span), but I want each control to only occur once. Having matched all possible combinations, some have as few as 13 matches and some have as many as 2000. If possible, I would for each case to be matched to the same number of unique controls. The selection of controls, where there are many options, should be randomized.
proc sql; create table match as select
a.case_id,a.case_age, b.match_id, b.match_age, ranuni(383467663) as rand
from cases as a,
controls as b
where
(b.match_age-1)<a.case_age<(b.match_age+1)
order by a.case_id;
quit;
I've used randomly generated numbers to sort by and delete duplicates among the controls. But I then end up with as few as one control for some cases (and still almost 2000 for others), and seem to lose some cases as well, which I definitely don't want to do.
Any ideas?
Thanks in advance!