Hi experts, I have a database with 1.2 million cases and 4.9 million controls. I would like do a 1:1 or 1:2 matching based on age alone or age and sex for looking at the risk of cancer as an outcome. I tried using the following code but my program is crashing. How do I handle such a big database and create a 1:1 or 1:2 matching without the program running for long period of time and eventually crashing? Also I need to analyze and adjust for about 8-10 other variables comparing the cases and controls for the risk of cancer (outcome variable). My 7+ million data set has the following variables. Age, Sex, Race, DM (yes/ no 1/0), var1 (1/0), var2 (1/0), var3 (1/0), ...var10 (1/0) and cancer (1/0). From this dataset, cases are defined if DM = 1 and controls if DM = 0. When I analyze the risk of cancer by DM, I think the patient profiles are different, which may bias the result (risk of cancer), therefore I would like to do a matching with at least with age or age and sex both. proc sql;
create table want as
select a.id as id_case, b.id as id_control , b.age, b.sex,b.year
from case a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_case, id_control;
create table freq as
select count(id_case) as count, id_case from want
group by id_case having count >= 3;
create table final as
select a.*
from want a,
freq b
where a.id_case=b.id_case;
quit; Is it valid to create a small randomized sample of cases and controls and then do a smaller sample of 1:2 matching from these randomized smaller samples? Any advice regarding this would be greatly appreciated.
... View more