Hi , I am trying to match controls to cases using gmatch macro(by Eric bregstralh). The processing time is fairly fast (3min) if I use 100 cases and 2000000 controls as the source data. However, I applied this macro to my unmatched case/control data [n=10832090 (cases =345013, pool of potential controls=10487077)] and its taking longer to process (four days now). Here is the code I used to process the 100cases/2000000 potential controls. Is it want to iterate until the last case is matched to 4 possible controls? %include "C:\SAS programs\gmatch.sas"; /**randomly selecting 100 cases without replacement**/ PROC SURVEYSELECT DATA=casecontrol( WHERE=(case = 1)) OUT=case METHOD=SRS N=100 SEED=1420; ID uniqueid gender yob ethnicity1 ; RUN; QUIT; /**randomly selecting 2000000 controls without replacement**/ PROC SURVEYSELECT DATA=casecontrol ( WHERE=(T2DMstatus =0)) OUT=control METHOD=SRS N=2000000 SEED=345; ID uniqueid gender yob ethnicity1 ; RUN; QUIT; proc sort data=case; by uniqueid;run; proc sort data=control;by uniqueid;run; data unmatched; set case control; by uniqueid; run; %gmatch(data=unmatched,group=t2dmstatus,id=uniqueid,mvars= gender yob ethnicity1,wts=10 1 4,dmaxk=0 1 0, dist=1,transf =1, contls=4,seedca=67,seedco=99,out=greedymatched,outnmca=nonmathcedcases,outnmco=nonmathchedcontrols); data match; set greedymatched; run; /**creating a dataset that doesn't contain the matches just made from above**/ /** next random selection of cases and controls should be done on this dataset**/ PROC SQL; CREATE TABLE potential_match AS SELECT * FROM casecontrol_2014 WHERE uniqueid NOT IN (SELECT __IDCA FROM match) AND uniqueid NOT IN (SELECT __IDCO FROM match); %END; /***************************************************************************/ I want to be able to repeat this whole process until the dataset potential_match has no more case. This process is likely to produce close to 3000 datasets that will be merged.
... View more