Hi,
I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf
Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros.
I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you!
Syntax:
PROC SQL;
CREATE table controls_id
as select
one.ID as
study_id,
two.ID as control_id,
one.age as study_age,
two.age as control_age,
one.race as study_race,
two.race as control_race,
one.rand_num as rand_num
from study one, control two
where (one.age=two.age and
one.race=two.race);
* Remove duplicate control subjects;
proc sort data=controls_id nodupkey;
by control_id rand_num;
run;
*exactly match on variables with fixed number of controls;
proc sort data=controls_id ;
by study_id rand_num;
run;
data controls_id2 not_enough;
set controls_id;
by study_id ;
retain num;
if first.study_id then num=1;
if num le 2 then do;
output controls_id2;
num=num+1;
end;
if last.study_id then do;
if num le 2 then output not_enough;
end;
run;
proc print data=controls_id2
(obs=40)
;
title2 'matched patients';
run;
*use following syntax to remove cases that do not have two controls;
data controls_id3;
merge controls_id2
not_enough(in=b_);
by study_id;
if b_ then delete;
run;