BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sms1891
Quartz | Level 8

Hi all,

I need help with SAS code for generating a matched cohort using cases (n=23) and controls (n=2199). For every case, I need at least 3 controls. I would like to match these 23 cases to controls using Age, Sex and Year of admission. I have cases and controls as two separate data sets (Dataset A and Dataset B) but the variables names are exactly the same in both (Age, Sex, Year) and the ID variable (Encounter_ID). 

 

I tried using  syntax (accepted solution) posted on this community link below, but I was not getting any output for the merge data set step.

https://communities.sas.com/t5/SAS-Programming/matching-based-on-three-criteria/m-p/501785

 

I would really appreciate if anyone can help me with this matching SAS code.

 

Thank you so much!

Sat 

1 ACCEPTED SOLUTION

Accepted Solutions
smantha
Lapis Lazuli | Level 10

Any data examples would be good. Things to check

1. Are all the overlapping variables in both the datasets in the same format and datatype. I.e. if they are numeric in one dataset then they are numeric in the other.

2. If there are overlapping character variables in both the datasets are they of the same length and same case (upper and lower), no padding blanks etc. Off hand Do you know if there is an over lap on these variables.

3. Are the datasets sorted by the overlapping variables? If there arise a situation where there is many to many join then use proc sql instead of datastep merge. If it is one to many merge you can safely proceed with the data step merge. 

 

Coming to the actual problem The following code might work for you.

 

proc sort data= test(rename=(id=id_test)); by age sex year; run;

proc  sort data=control(rename=(id=id_control)); By age sex year; run;

data want;
merge test control;
by age sex year;
run;

while i was writing above code i felt it would be a many to many merge so might not work right.  Alternative code is below

proc sql;
create table want as
select a.id as id_test, b.id as id_control , b.age, b.sex,b.year
from test a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_test, id_control;

create table freq as
select count(id_test) as count, id_test from want
group by id_test having count >= 3;

create table final as
select a.*
from want a,
freq b
where a.id_test=b.id_test;
quit;

Hope this helps

 

 

View solution in original post

3 REPLIES 3
smantha
Lapis Lazuli | Level 10

Any data examples would be good. Things to check

1. Are all the overlapping variables in both the datasets in the same format and datatype. I.e. if they are numeric in one dataset then they are numeric in the other.

2. If there are overlapping character variables in both the datasets are they of the same length and same case (upper and lower), no padding blanks etc. Off hand Do you know if there is an over lap on these variables.

3. Are the datasets sorted by the overlapping variables? If there arise a situation where there is many to many join then use proc sql instead of datastep merge. If it is one to many merge you can safely proceed with the data step merge. 

 

Coming to the actual problem The following code might work for you.

 

proc sort data= test(rename=(id=id_test)); by age sex year; run;

proc  sort data=control(rename=(id=id_control)); By age sex year; run;

data want;
merge test control;
by age sex year;
run;

while i was writing above code i felt it would be a many to many merge so might not work right.  Alternative code is below

proc sql;
create table want as
select a.id as id_test, b.id as id_control , b.age, b.sex,b.year
from test a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_test, id_control;

create table freq as
select count(id_test) as count, id_test from want
group by id_test having count >= 3;

create table final as
select a.*
from want a,
freq b
where a.id_test=b.id_test;
quit;

Hope this helps

 

 

sms1891
Quartz | Level 8
Thank you! This worked!!!
Reeza
Super User
You may want to consider using PROC PSMATCH which is entirely designed for case-control matching - propensity score. The documentation has an example on greedy nearest neighbour matching which would match a 'merge' algorithm.

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_psmatch_examples04.htm&docsetVers...

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 3857 views
  • 0 likes
  • 3 in conversation