Hi, I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros. I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you! Syntax: PROC SQL; CREATE table controls_id as select one.ID as study_id, two.ID as control_id, one.age as study_age, two.age as control_age, one.race as study_race, two.race as control_race, one.rand_num as rand_num from study one, control two where (one.age=two.age and one.race=two.race); * Remove duplicate control subjects; proc sort data=controls_id nodupkey; by control_id rand_num; run; *exactly match on variables with fixed number of controls; proc sort data=controls_id ; by study_id rand_num; run; data controls_id2 not_enough; set controls_id; by study_id ; retain num; if first.study_id then num=1; if num le 2 then do; output controls_id2; num=num+1; end; if last.study_id then do; if num le 2 then output not_enough; end; run; proc print data=controls_id2 (obs=40) ; title2 'matched patients'; run; *use following syntax to remove cases that do not have two controls; data controls_id3; merge controls_id2 not_enough(in=b_); by study_id; if b_ then delete; run;
... View more