https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/173-29.pdf
I've been trying to manipulate coding included in the above link to match controls and cases based on variables 'age' +/-5yrs and 'sex'. I have a dataset each for cases (WORK.cases) and controls (WORK.controls). My problem is that several controls are duplicated across multiple cases. It also seems that in the instances where there are no appropriate controls for matching, instead of writing to the not_enough output, cases are being matched with other cases and kept in the matched group.
I also need the final matched dataset to be one row per individual and for it to include all of the variables from the original case and control datasets, but I'm unsure where the KEEP= statement needs to go.
Any help would be greatly appreciated. I'm working on SAS 9.04.01 and current coding is as follows:
data controls2;
rand_num=uniform(0);
set WORK.controls WORK.cases;
age_low=age-5;
age_high=age+5;
run;
proc sql;
create table controls_UID as
select
one.UID as cases_UID,
two.UID as control_UID,
one.age as cases_age,
two.age as control_age,
one.sex as cases_sex,
two.sex as control_sex,
two.rand_num as rand_num
from cases one, controls2 two
where ((one.age between two.age_low and two.age_high)
and one.sex=two.sex);
proc sort data=controls_UID;
by cases_UID;
run;
data controls_UID2;
set controls_UID;
by cases_UID;
retain num_controls;
if first.cases_UID then num_controls=1;
else num_controls=num_controls+1;
if last.cases_UID then output;
run;
data controls_UID3;
rand_num=uniform(0);
merge controls_UID
controls_UID2;
by cases_UID;
run;
proc sort data=controls_UID3;
by control_UID num_controls rand_num;
run;
data controls_UID4;
set controls_UID3;
by control_UID;
if first.control_UID;
run;
proc sort data=controls_UID;
by cases_UID rand_num;
run;
data controls_UID2 not_enough;
set controls_UID;
by cases_UID;
retain num;
if first.cases_UID then num=1;
if num le 3 then do;
output controls_UID2;
num=num+1;
end;
if last.cases_UID then do;
if num le 3 then output not_enough;
end;
run;
proc print data=controls_UID2;
title2 'matched cases';
run;
Hello @MattFD
I do not have a conceptual understanding of your business case. However if one looks at the first Proc SQL code that creates the dataset controls_UID, the step creates a cartesian join and keeping records where there is a match in sex and the age in cases between the age limits of the table controls2. This I feel is the cause of the issue you are facing. Review the logic of this step in line with your business requirements.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.