Hi everyone!
I am joining two tables that have a variable id in common and each has a variable date (Date and Date2) that are causally related. In particular, Date can determine zero, one or more events in Date2. Here you can see an image of the two tables:
Table1
Table2:
I would like to join these two tables to show, for every event in Date if it is followed by 0,1 or more events of Date2. Clearly every event in Date2 could be attributed to the oldest date, but I would like that every Date2 is attributed to a Date immediately before until there is another successive Date to attribute the Dates2 exactly as, if someone have an allergic reaction, it is more likely that this event is related to what he ate yesterday then what he ate six months before.
I also link some fake data as a working data set.
data Database1;
input id $ date :ddmmyy10. ;
cards;
111 01/01/2015
111 20/03/2015
111 15/06/2016
111 12/03/2017
111 30/06/2020
222 18/03/2020
222 25/04/2020
;
run;
data Database2;
input id $ date2 :ddmmyy10. ;
cards;
111 18/06/2016
111 15/08/2016
111 17/03/2017
111 20/01/2018
111 04/04/2018
111 20/04/2020
222 26/05/2020
;
run;
Thank you very much! 😁
Try this
data have1;
input id $ date :ddmmyy10.;
format date ddmmyy10.;
cards;
111 01/01/2015
111 20/03/2015
111 15/06/2016
111 12/03/2017
111 30/06/2020
222 18/03/2020
222 25/04/2020
;
data have2;
input id $ date2 :ddmmyy10.;
format date2 ddmmyy10.;
cards;
111 18/06/2016
111 15/08/2016
111 17/03/2017
111 20/01/2018
111 04/04/2018
111 20/04/2020
222 26/05/2020
;
data want(drop = d i);
if _N_ = 1 then do;
dcl hash h(dataset : 'have2', multidata : 'Y', ordered : 'Y');
h.definekey('id');
h.definedata('date2');
h.definedone();
end;
set have1;
by id;
set have1(keep = date rename=(date = d) firstobs = 2)
have1(drop = _all_ obs = 1);
if last.id then d = '31dec9999'd;
do i = 1 by 1 while (h.do_over() = 0);
if date < date2 < d then output;
else if i = 1 and (d = '31dec9999'd | date2 > d) then do;
date2 = .;
output;
end;
end;
format date2 ddmmyy10.;
run;
Result:
id date date2 111 01/01/2015 . 111 20/03/2015 . 111 15/06/2016 18/06/2016 111 15/06/2016 15/08/2016 111 12/03/2017 17/03/2017 111 12/03/2017 20/01/2018 111 12/03/2017 04/04/2018 111 12/03/2017 20/04/2020 111 30/06/2020 . 222 18/03/2020 . 222 25/04/2020 26/05/2020
Use next tested code:
data Database1;
input id $ date :ddmmyy10. ;
cards;
111 01/01/2015
111 20/03/2015
111 15/06/2016
111 12/03/2017
111 30/06/2020
222 18/03/2020
222 25/04/2020
;
run;
data Database2;
input id $ date2 :ddmmyy10. ;
cards;
111 18/06/2016
111 15/08/2016
111 17/03/2017
111 20/01/2018
111 04/04/2018
111 20/04/2020
222 26/05/2020
;
run;
data temp / view=temp ;
set Database1 Database2;
sortdt = coalesce(date, date2);
format date date2 sortdt ddmmyy10.;
run;
proc sort data=temp out=temp1; by id sortdt ; run;
data want;
set temp1;
by id sortdt ;
retain dt1;
if first.id then dt1=date;
if date then do;
dt1=date;
end;
else date=dt1;
drop sortdt dt1;
run;
data want;
set want;
by id date;
if not (first.date and last.date)
and date2=. then delete;
run;
Try this
data have1;
input id $ date :ddmmyy10.;
format date ddmmyy10.;
cards;
111 01/01/2015
111 20/03/2015
111 15/06/2016
111 12/03/2017
111 30/06/2020
222 18/03/2020
222 25/04/2020
;
data have2;
input id $ date2 :ddmmyy10.;
format date2 ddmmyy10.;
cards;
111 18/06/2016
111 15/08/2016
111 17/03/2017
111 20/01/2018
111 04/04/2018
111 20/04/2020
222 26/05/2020
;
data want(drop = d i);
if _N_ = 1 then do;
dcl hash h(dataset : 'have2', multidata : 'Y', ordered : 'Y');
h.definekey('id');
h.definedata('date2');
h.definedone();
end;
set have1;
by id;
set have1(keep = date rename=(date = d) firstobs = 2)
have1(drop = _all_ obs = 1);
if last.id then d = '31dec9999'd;
do i = 1 by 1 while (h.do_over() = 0);
if date < date2 < d then output;
else if i = 1 and (d = '31dec9999'd | date2 > d) then do;
date2 = .;
output;
end;
end;
format date2 ddmmyy10.;
run;
Result:
id date date2 111 01/01/2015 . 111 20/03/2015 . 111 15/06/2016 18/06/2016 111 15/06/2016 15/08/2016 111 12/03/2017 17/03/2017 111 12/03/2017 20/01/2018 111 12/03/2017 04/04/2018 111 12/03/2017 20/04/2020 111 30/06/2020 . 222 18/03/2020 . 222 25/04/2020 26/05/2020
Hi @PeterClemmensen and @Shmuel ,
Thank you both for helping me with your wonderful codes. I tried and studied both (this is the reason I didn't answer immediately) as they give different results on my data. Although none of them gives the exact result I was looking for, the code of @PeterClemmensen is that who is nearer. This is the reason why I choose your answer. I think I will write you a private message to ask you some explanation about your code 😀.
Anyway, Thank you very much to both of you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.