for example, here is my data about people who use a drug X.
ID | start_date | end_date |
001 | 2020-01-10 | 2020-02-08 |
001 | 2020-02-09 | 2020-03-09 |
001 | 2020-03-01 | 2020-03-30 |
001 | 2021-05-07 | 2021-06-06 |
002 | 2020-07-01 | 2020-07-30 |
002 | 2020-07-26 | 2020-08-24 |
002 | 2020-08-15 | 2020-09-13 |
002 | 2021-05-07 | 2021-06-06 |
Assume the supply for this drug is 30 days .
It is clear that there are days overlapped, in this case, I would like to add the overlapped days to the last end date for each time interval.
e.g. For 001, there are 9 days overlapped (2020-03-01 to 2020-03-09);
e.g. For 002, there are 5 days and 10 days overlapped;
I wish I could have a simple time interval including start to end, and plus overlapped days.
What I want to see is a dataset like this:
ID | start_date | end_date |
001 | 2020-01-10 | 2020-04-07 (2020-03-30 +9 days) |
001 | 2021-05-07 | 2021-06-06 |
002 | 2020-07-01 | 2020-09-27 (2020-09-13 +5+10 days) |
002 | 2021-05-07 | 2021-06-06 |
Hope I could get some help
data have;
infile cards expandtabs;
input ID $ (start_date end_date) (:yymmdd10.);
format start_date end_date yymmdd10.;
cards;
001 2020-01-10 2020-02-08
001 2020-02-09 2020-03-09
001 2020-03-01 2020-03-30
001 2021-05-07 2021-06-06
002 2020-07-01 2020-07-30
002 2020-07-26 2020-08-24
002 2020-08-15 2020-09-13
002 2021-05-07 2021-06-06
;
data temp;
set have;
do date=start_date to end_date;
output;
end;
format date yymmdd10.;
keep id date;
run;
proc freq data=temp noprint;
table id*date/out=temp2 list;
run;
data temp3;
set temp2;
flag=ifn(count=1,1,0);
keep id date flag;
run;
data temp4;
set temp3;
by id flag notsorted;
if first.id or dif(date) ne 1 then group1+1;
if first.flag or dif(date) ne 1 then group2+1;
run;
proc sql;
create table temp5 as
select distinct group1,group2,id,flag,
min(date) as start_date format=yymmdd10.,max(date) as end_date format=yymmdd10.,
case when flag=0 then max(date)-min(date)+1 else . end as count
from temp4
group by group1,group2;
create table want as
select distinct group1,id,
min(start_date) as start_date format=yymmdd10.,sum(max(end_date),sum(count)-1) as end_date format=yymmdd10.
from temp5
group by group1;
quit;
Calculating the overlap of date/time intervals
By Leonid Batkhan on SAS Users January 13, 2022
https://blogs.sas.com/content/sgf/2022/01/13/calculating-the-overlap-of-date-time-intervals/
Koen
data have;
infile cards expandtabs;
input ID $ (start_date end_date) (:yymmdd10.);
format start_date end_date yymmdd10.;
cards;
001 2020-01-10 2020-02-08
001 2020-02-09 2020-03-09
001 2020-03-01 2020-03-30
001 2021-05-07 2021-06-06
002 2020-07-01 2020-07-30
002 2020-07-26 2020-08-24
002 2020-08-15 2020-09-13
002 2021-05-07 2021-06-06
;
data temp;
set have;
do date=start_date to end_date;
output;
end;
format date yymmdd10.;
keep id date;
run;
proc freq data=temp noprint;
table id*date/out=temp2 list;
run;
data temp3;
set temp2;
flag=ifn(count=1,1,0);
keep id date flag;
run;
data temp4;
set temp3;
by id flag notsorted;
if first.id or dif(date) ne 1 then group1+1;
if first.flag or dif(date) ne 1 then group2+1;
run;
proc sql;
create table temp5 as
select distinct group1,group2,id,flag,
min(date) as start_date format=yymmdd10.,max(date) as end_date format=yymmdd10.,
case when flag=0 then max(date)-min(date)+1 else . end as count
from temp4
group by group1,group2;
create table want as
select distinct group1,id,
min(start_date) as start_date format=yymmdd10.,sum(max(end_date),sum(count)-1) as end_date format=yymmdd10.
from temp5
group by group1;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.