data a;
input patientid dg_date mmddyy10. age sex;
datalines;
1 5/5/2009 18 1
2 1/2/2007 60 1
3 2/2/2004 70 0
;
run;
data b;
input patientid date2 mmddyy10. age sex;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5 1/2/2007 60 1
5 1/1/2007 60 1
6 2/2/2004 79 0
6 3/3/2004 79 0
6 1/1/2004 79 0
;
run;
I have two data a and b. Data a has the diagnosis date, I want to match data a to data b by selecting ids which match on age, sex, and have any date "date2" that matches exactly the dg_date.
output (1 case and 0 control)
id label
1 1
2 1
4 0
5 0
Yes. I could. Just add one more PROC SORT at bottom of code.
data a;
input patientid dg_date : mmddyy10. age sex;
format dg_date mmddyy10.;
datalines;
1 5/5/2009 18 1
2 1/2/2007 60 1
3 2/2/2004 70 0
;
run;
data b;
input patientid date2 : mmddyy10. age sex;
format date2 mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5 1/2/2007 60 1
5 1/1/2007 60 1
6 2/2/2004 79 0
6 3/3/2004 79 0
6 1/1/2004 79 0
7 5/5/2009 18 1
7 10/6/2009 18 1
8 1/2/2007 60 1
8 1/1/2007 60 1
9 2/2/2004 79 0
9 3/3/2004 79 0
9 1/1/2004 79 0
10 5/5/2009 18 1
10 10/6/2009 18 1
11 1/2/2007 60 1
11 1/1/2007 60 1
12 2/2/2004 79 0
12 3/3/2004 79 0
12 1/1/2004 79 0
13 5/5/2009 18 1
13 10/6/2009 18 1
14 1/2/2007 60 1
14 1/1/2007 60 1
15 2/2/2004 79 0
15 3/3/2004 79 0
15 1/1/2004 79 0
;
run;
proc sort data=a out=temp_a;
by dg_date age sex;
run;
proc sort data=b out=temp_b;
by date2 age sex;
run;
data temp;
merge temp_a(in=ina)
temp_b(rename=(date2=dg_date patientid=_patientid) in=inb);
by dg_date age sex;
if ina and inb;
run;
data want;
set temp;
id=patientid;label=1;output;
id=_patientid;label=0;output;
keep id label;
run;
proc sort data=want out=final_want nodupkey;
by id label ;
run;
Well, if I understand you correctly:
proc sql; create table want as select a.id, case when b.patientid ne . then 1 else 0 end as label from a a left join b b on a.age=b.age and a.sex=b.sex and a.dg_date=date2;
quit;
Thank you RW9. Although the specification you included seems correct, the output is not correct so it includes only ids from data a.
Ok, you would need one for each way then:
proc sql; create table want as select a.id, case when b.patientid ne . then 1 else 0 end as label from a a left join b b on a.age=b.age and a.sex=b.sex and a.dg_date=date2
union all
select a.id,
case when b.patientid ne . then 1 else 0 end as label
from b a
left join a b
on a.age=b.age
and a.sex=b.sex
and a.date2=b.dg_date
;
quit;
thank you for the prompt response but the output is still not correct. id's number 3 and 6 should not be included in the dataset and the output listed id 5 once as 1 (case) and one as 0 (control), it should be only a control (0).
data a;
input patientid dg_date : mmddyy10. age sex;
format dg_date mmddyy10.;
datalines;
1 5/5/2009 18 1
2 1/2/2007 60 1
3 2/2/2004 70 0
;
run;
data b;
input patientid date2 : mmddyy10. age sex;
format date2 mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5 1/2/2007 60 1
5 1/1/2007 60 1
6 2/2/2004 79 0
6 3/3/2004 79 0
6 1/1/2004 79 0
;
run;
proc sort data=a out=temp_a;
by dg_date age sex;
run;
proc sort data=b out=temp_b;
by date2 age sex;
run;
data temp;
merge temp_a(in=ina)
temp_b(rename=(date2=dg_date patientid=_patientid) in=inb);
by dg_date age sex;
if ina and inb;
run;
data want;
set temp;
id=patientid;label=1;output;
id=_patientid;label=0;output;
keep id label;
run;
Perfect Ksharp! I have a large pool of controls, so would it be possible to use the same code to match 1 case to more than one control (e.g. 16)?
Technically , it could.
Post your data and output to explain your question more details.
data a;
input patientid dg_date : mmddyy10. age sex;
format dg_date mmddyy10.;
datalines;
1 5/5/2009 18 1
2 1/2/2007 60 1
3 2/2/2004 70 0
;
run;
data b;
input patientid date2 : mmddyy10. age sex;
format date2 mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5 1/2/2007 60 1
5 1/1/2007 60 1
6 2/2/2004 79 0
6 3/3/2004 79 0
6 1/1/2004 79 0
7 5/5/2009 18 1
7 10/6/2009 18 1
8 1/2/2007 60 1
8 1/1/2007 60 1
9 2/2/2004 79 0
9 3/3/2004 79 0
9 1/1/2004 79 0
10 5/5/2009 18 1
10 10/6/2009 18 1
11 1/2/2007 60 1
11 1/1/2007 60 1
12 2/2/2004 79 0
12 3/3/2004 79 0
12 1/1/2004 79 0
13 5/5/2009 18 1
13 10/6/2009 18 1
14 1/2/2007 60 1
14 1/1/2007 60 1
15 2/2/2004 79 0
15 3/3/2004 79 0
15 1/1/2004 79 0
;
run;
output
id label
1 1
2 1
4 0
5 0
7 0
8 0
10 0
11 0
13 0
14 0
Just to avoid lengthy data, in this data, each case from a would match to 4 controls from b
Yes. I could. Just add one more PROC SORT at bottom of code.
data a;
input patientid dg_date : mmddyy10. age sex;
format dg_date mmddyy10.;
datalines;
1 5/5/2009 18 1
2 1/2/2007 60 1
3 2/2/2004 70 0
;
run;
data b;
input patientid date2 : mmddyy10. age sex;
format date2 mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5 1/2/2007 60 1
5 1/1/2007 60 1
6 2/2/2004 79 0
6 3/3/2004 79 0
6 1/1/2004 79 0
7 5/5/2009 18 1
7 10/6/2009 18 1
8 1/2/2007 60 1
8 1/1/2007 60 1
9 2/2/2004 79 0
9 3/3/2004 79 0
9 1/1/2004 79 0
10 5/5/2009 18 1
10 10/6/2009 18 1
11 1/2/2007 60 1
11 1/1/2007 60 1
12 2/2/2004 79 0
12 3/3/2004 79 0
12 1/1/2004 79 0
13 5/5/2009 18 1
13 10/6/2009 18 1
14 1/2/2007 60 1
14 1/1/2007 60 1
15 2/2/2004 79 0
15 3/3/2004 79 0
15 1/1/2004 79 0
;
run;
proc sort data=a out=temp_a;
by dg_date age sex;
run;
proc sort data=b out=temp_b;
by date2 age sex;
run;
data temp;
merge temp_a(in=ina)
temp_b(rename=(date2=dg_date patientid=_patientid) in=inb);
by dg_date age sex;
if ina and inb;
run;
data want;
set temp;
id=patientid;label=1;output;
id=_patientid;label=0;output;
keep id label;
run;
proc sort data=want out=final_want nodupkey;
by id label ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.