Solved: Re: Sas code for 1:3 matching using Age, Sex and Year Variables

sms1891 · Posted 08-14-2020 09:37 PM

Hi all,

I need help with SAS code for generating a matched cohort using cases (n=23) and controls (n=2199). For every case, I need at least 3 controls. I would like to match these 23 cases to controls using Age, Sex and Year of admission. I have cases and controls as two separate data sets (Dataset A and Dataset B) but the variables names are exactly the same in both (Age, Sex, Year) and the ID variable (Encounter_ID).

I tried using syntax (accepted solution) posted on this community link below, but I was not getting any output for the merge data set step.

https://communities.sas.com/t5/SAS-Programming/matching-based-on-three-criteria/m-p/501785

I would really appreciate if anyone can help me with this matching SAS code.

Thank you so much!

Sat

smantha · Posted 08-14-2020 10:00 PM

Any data examples would be good. Things to check

1. Are all the overlapping variables in both the datasets in the same format and datatype. I.e. if they are numeric in one dataset then they are numeric in the other.

2. If there are overlapping character variables in both the datasets are they of the same length and same case (upper and lower), no padding blanks etc. Off hand Do you know if there is an over lap on these variables.

3. Are the datasets sorted by the overlapping variables? If there arise a situation where there is many to many join then use proc sql instead of datastep merge. If it is one to many merge you can safely proceed with the data step merge.

Coming to the actual problem The following code might work for you.

proc sort data= test(rename=(id=id_test)); by age sex year; run;

proc  sort data=control(rename=(id=id_control)); By age sex year; run;

data want;
merge test control;
by age sex year;
run;

while i was writing above code i felt it would be a many to many merge so might not work right. Alternative code is below

proc sql;
create table want as
select a.id as id_test, b.id as id_control , b.age, b.sex,b.year
from test a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_test, id_control;

create table freq as
select count(id_test) as count, id_test from want
group by id_test having count >= 3;

create table final as
select a.*
from want a,
freq b
where a.id_test=b.id_test;
quit;

Hope this helps

View solution in original post

smantha · Posted 08-14-2020 10:00 PM

Any data examples would be good. Things to check

1. Are all the overlapping variables in both the datasets in the same format and datatype. I.e. if they are numeric in one dataset then they are numeric in the other.

2. If there are overlapping character variables in both the datasets are they of the same length and same case (upper and lower), no padding blanks etc. Off hand Do you know if there is an over lap on these variables.

3. Are the datasets sorted by the overlapping variables? If there arise a situation where there is many to many join then use proc sql instead of datastep merge. If it is one to many merge you can safely proceed with the data step merge.

Coming to the actual problem The following code might work for you.

proc sort data= test(rename=(id=id_test)); by age sex year; run;

proc  sort data=control(rename=(id=id_control)); By age sex year; run;

data want;
merge test control;
by age sex year;
run;

while i was writing above code i felt it would be a many to many merge so might not work right. Alternative code is below

proc sql;
create table want as
select a.id as id_test, b.id as id_control , b.age, b.sex,b.year
from test a
,
control b
where a.age = b.age and a.sex=b.sex and a.year=b.year
order by id_test, id_control;

create table freq as
select count(id_test) as count, id_test from want
group by id_test having count >= 3;

create table final as
select a.*
from want a,
freq b
where a.id_test=b.id_test;
quit;

Hope this helps

sms1891 · Posted 08-18-2020 12:10 PM

Thank you! This worked!!!

Reeza · Posted 08-14-2020 10:11 PM

You may want to consider using PROC PSMATCH which is entirely designed for case-control matching - propensity score. The documentation has an example on greedy nearest neighbour matching which would match a 'merge' algorithm.

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_psmatch_examples04.htm&docsetVers...

Sas code for 1:3 matching using Age, Sex and Year Variables

Re: Sas code for 1:3 matching using Age, Sex and Year Variables

Re: Sas code for 1:3 matching using Age, Sex and Year Variables

Re: Sas code for 1:3 matching using Age, Sex and Year Variables

Re: Sas code for 1:3 matching using Age, Sex and Year Variables

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!