Hi all, I recently started trying to use hash tables for case control matching. My goal is to do the following. 1) On the first run match cases to two controls very closely. Ideally I would find all my matched controls in this step but there are two possibilities a) I only find 1 control that meets this criteria b) I don't find any controls to meet criteria. 2) For all those cases that fall under a, and b I would like to run the matching again with a wider tolerance on the matching. I found some code from SAS Edmonton user group by George Zhu that does step 1. I tried to modify the code so that it would go through and do step 1 and then for any cases left over go and do step 2. However, I have failed in my attempts to do this. Part of the problem is that the second run is overwriting the results of the first so closer matches are getting over written with looser matches. This is because of the cases.replace() in the second loop. Changing the replace to an .add() does not do what I want either because it adds to those that already have 2 matches. Below is the code. I would greatly appreciate any help. Joe **************************************************************************************** *****DATA SETUP; ***************************************************************************************; %let ratio=2; %let nCase=100; %let nControl=100000; %let ratio=2; * Generate the Cases data set; data cases(drop=i); retain id age gender; length gender $1; do i=1 to &nCase.; id=i; age=floor(ranuni(0)*100); gender=ifc(ranuni(0)>0.5,"F","M"); output; end; run; * Generate the Controls data set; data controls(drop=i); retain id age gender; length gender $1; do i=1 to &nControl.; id=i; age=floor(ranuni(0)*73); gender=ifc(ranuni(5)>0.7,"F","M"); output; end; run; data Controls_H; set Controls; control_rand=ranuni(0); rename age=control_age gender=control_gender id=control_id; run; proc sort data=controls_H; *scramble the control list for randomness; by control_rand; run; data Cases_H; set Cases; case_rand=ranuni(0); *for scramble the order of cases; rename age=case_age gender=case_gender id=case_id; count=0; *for recording number of controls matched; run; **************************************************************************************** ********* (2) MATCHES WITH EVEN MORE FUZZY AGE ***************************************************************************************; ********* NOTE: if does not find age within 5 years I want it to use the 10 year interval; * the main data step; data _null_; if _n_=1 then do; set cases_h(obs=1); *make the variables in Cases data set available in the PDV; *put the cases in a hash table; declare hash cases(dataset:'cases_H',hashexp:15,ordered:'y'); cases.definekey("case_rand","case_id"); cases.definedata("case_rand","case_id","count","case_age","case_gender"); cases.definedone(); declare hiter hi_cases('cases'); * declare a hash table iterator object; declare hash matches(); *declare a hash table for matched cases and controls; matches.definekey("case_id","control_id"); matches.definedata("case_id","control_id"); matches.definedone(); *declare a hash table for recording matched controls; control_id_hash=case_id; declare hash m_control(); m_control.definekey("control_id_hash"); m_control.definedone(); m_control.clear(); end; set controls_h end=eof; control_id_hash=control_id; *get current control_id for searching; if (m_control.find() ne 0) then do; *not matched to a case yet; rc=hi_cases.first(); *search cases table using hash iterator object; do while(rc=0); if (count<&ratio. and case_gender=control_gender and abs(case_age-control_age)<=5) then do; count+1; cases.replace(); matches.add(); m_control.add(); leave; end; if (matches.find() ne 0) then do; ***if case is not found in matched output???**; if (count<&ratio. and case_gender=control_gender and abs(case_age-control_age)<=10) then do; count+1; cases.replace(); matches.add(); m_control.add(); leave; end; end; rc=hi_cases.next(); end; end; *check if all the cases have matches (ie, count=&ratio.); done=1; rc=hi_cases.first(); do while(rc=0); if count<&ratio. then do; done=0; leave; end; rc=hi_cases.next(); end; *if all the cases are matched or run out of controls, output the resulting data sets; if (done or eof) then do; matches.output(dataset:"matches"); cases.output(dataset:"matched_cases"); m_control.output(dataset:"matched_controls"); stop; end; run;
... View more