- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi. I have two data sets with dates I am trying to combine based upon a range of days surrounding the dates (i.e., +/- 7 days).
The first dataset has an id, reason for evaluation, injured body part and the date of evaluation.
data have1;
input id $ reason $ bodypart $ evaldate :date9.;
format evaldate date9.;
datalines;
S001 injury ankle 01JAN2020
S001 injury knee 17APR2020
S002 injury hand 04JUL2020
S004 injury shoulder 31JUL2020
S005 injury back 01SEP2020
;
RUN;
The second data set has a matching id, a reason and bodypart variable (which may or may not match with the first data set), and then a start and end date representing work days lost.
data have2;
input id $ reason $ bodypart $ startdate :date9. enddate :date9.;
format startdate date9. enddate :date9.;
datalines;
S001 injury leg 01JAN2020 31JAN2020
S001 injury knee 14APR2020 21APR2020
S002 injury wrist 07JUL2020 09JUL2020
S003 injury foot 23SEP2021 31OCT2021
S004 injury shoulder 07JUL2020 09JUL2020
S005 injury neck 30SEP2020 01OCT2020
;
RUN;
How can I match on a date range (+/- 7 days, for example)? This is necessary since the evaldate and startdate do not always line up.
I would want all observations to appear in the combined dataset, but if they don't fall within the 7 days I would want them to be flagged as not having a match and not be merged. I would also want to know the count of lost days (dayslost=intck('day',startdate,enddate+1)) which I'm assuming I would need to run in a separate datastep. I ultimately want to end up with something like the following:
data want;
input id $ reason $ bodypart $ evaldate :date9. reason2 $ bodypart2$ startdate :date9. enddate :date9. dayslost match;
format evaldate date9. startdate date9. enddate date9.;
datalines;
S001 injury ankle 01JAN2020 injury leg 01JAN2020 31JAN2020 31 1
S001 injury knee 17APR2020 injury knee 14APR2020 21APR2020 8 1
S002 injury hand 04JUL2020 injury wrist 07JUL2020 09JUL2020 3 1
S003 . . . injury foot 23SEP2021 31OCT2021 39 0
S004 injury shoulder 31JUL2020 . . . . . 0
S004 . . . injury shoulder 07JUL2020 09JUL2020 3 0
S005 injury back 01SEP2020 . . . . . 0
S005 . . . injury neck 30SEP2020 01OCT2020 2 0
;
run;
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasgorilla,
You can implement a full join like this in PROC SQL:
proc sql;
create table want as
select coalesce(a.id, b.id) as id, a.reason, a.bodypart, a.evaldate,
b.reason as reason2, b.bodypart as bodypart2,
startdate, enddate, enddate-startdate+1 as dayslost,
a.id=b.id as match
from have1 a full join have2 b
on a.id=b.id & abs(evaldate-startdate)<=7
order by id, b.id, a.id, evaldate, startdate; /* or maybe better: by id, coalesce(evaldate, startdate) */
quit;
Note that multiple copies of observations of either input dataset will be created by this step if they match more than one observation of the other dataset. There is no such case in your sample data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasgorilla,
You can implement a full join like this in PROC SQL:
proc sql;
create table want as
select coalesce(a.id, b.id) as id, a.reason, a.bodypart, a.evaldate,
b.reason as reason2, b.bodypart as bodypart2,
startdate, enddate, enddate-startdate+1 as dayslost,
a.id=b.id as match
from have1 a full join have2 b
on a.id=b.id & abs(evaldate-startdate)<=7
order by id, b.id, a.id, evaldate, startdate; /* or maybe better: by id, coalesce(evaldate, startdate) */
quit;
Note that multiple copies of observations of either input dataset will be created by this step if they match more than one observation of the other dataset. There is no such case in your sample data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @FreelanceReinh ! This did the trick. I appreciate your help!