Hi all,
I need help with SAS code for generating a matched cohort using cases (n=23) and controls (n=2199). For every case, I need at least 3 controls. I would like to match these 23 cases to controls using Age, Sex and Year of admission. I have cases and controls as two separate data sets (Dataset A and Dataset B) but the variables names are exactly the same in both (Age, Sex, Year) and the ID variable (Encounter_ID).
I tried using syntax (accepted solution) posted on this community link below, but I was not getting any output for the merge data set step.
https://communities.sas.com/t5/SAS-Programming/matching-based-on-three-criteria/m-p/501785
I would really appreciate if anyone can help me with this matching SAS code.
Thank you so much!
Sat
Any data examples would be good. Things to check
1. Are all the overlapping variables in both the datasets in the same format and datatype. I.e. if they are numeric in one dataset then they are numeric in the other.
2. If there are overlapping character variables in both the datasets are they of the same length and same case (upper and lower), no padding blanks etc. Off hand Do you know if there is an over lap on these variables.
3. Are the datasets sorted by the overlapping variables? If there arise a situation where there is many to many join then use proc sql instead of datastep merge. If it is one to many merge you can safely proceed with the data step merge.
Coming to the actual problem The following code might work for you.
proc sort data= test(rename=(id=id_test)); by age sex year; run;
proc sort data=control(rename=(id=id_control)); By age sex year; run;
data want;
merge test control;
by age sex year;
run;
while i was writing above code i felt it would be a many to many merge so might not work right. Alternative code is below
proc sql;
create table want as
select a.id as id_test, b.id as id_control , b.age, b.sex,b.year
from test a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_test, id_control;
create table freq as
select count(id_test) as count, id_test from want
group by id_test having count >= 3;
create table final as
select a.*
from want a,
freq b
where a.id_test=b.id_test;
quit;
Hope this helps
Any data examples would be good. Things to check
1. Are all the overlapping variables in both the datasets in the same format and datatype. I.e. if they are numeric in one dataset then they are numeric in the other.
2. If there are overlapping character variables in both the datasets are they of the same length and same case (upper and lower), no padding blanks etc. Off hand Do you know if there is an over lap on these variables.
3. Are the datasets sorted by the overlapping variables? If there arise a situation where there is many to many join then use proc sql instead of datastep merge. If it is one to many merge you can safely proceed with the data step merge.
Coming to the actual problem The following code might work for you.
proc sort data= test(rename=(id=id_test)); by age sex year; run;
proc sort data=control(rename=(id=id_control)); By age sex year; run;
data want;
merge test control;
by age sex year;
run;
while i was writing above code i felt it would be a many to many merge so might not work right. Alternative code is below
proc sql;
create table want as
select a.id as id_test, b.id as id_control , b.age, b.sex,b.year
from test a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_test, id_control;
create table freq as
select count(id_test) as count, id_test from want
group by id_test having count >= 3;
create table final as
select a.*
from want a,
freq b
where a.id_test=b.id_test;
quit;
Hope this helps
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.