for example, here is my data about people who use a drug X.
ID | start_date | end_date |
001 | 2020-01-10 | 2020-02-08 |
001 | 2020-02-09 | 2020-03-09 |
001 | 2020-03-01 | 2020-03-30 |
001 | 2021-05-07 | 2021-06-06 |
002 | 2020-07-01 | 2020-07-30 |
002 | 2020-07-26 | 2020-08-24 |
002 | 2020-08-15 | 2020-09-13 |
002 | 2021-05-07 | 2021-06-06 |
Assume the supply for this drug is 30 days .
It is clear that there are days overlapped, in this case, I would like to add the overlapped days to the last end date for each time interval.
e.g. For 001, there are 9 days overlapped (2020-03-01 to 2020-03-09);
e.g. For 002, there are 5 days and 10 days overlapped;
I wish I could have a simple time interval including start to end, and plus overlapped days.
What I want to see is a dataset like this:
ID | start_date | end_date |
001 | 2020-01-10 | 2020-04-07 (2020-03-30 +9 days) |
001 | 2021-05-07 | 2021-06-06 |
002 | 2020-07-01 | 2020-09-27 (2020-09-13 +5+10 days) |
002 | 2021-05-07 | 2021-06-06 |
Hope I could get some help
data have;
infile cards expandtabs;
input ID $ (start_date end_date) (:yymmdd10.);
format start_date end_date yymmdd10.;
cards;
001 2020-01-10 2020-02-08
001 2020-02-09 2020-03-09
001 2020-03-01 2020-03-30
001 2021-05-07 2021-06-06
002 2020-07-01 2020-07-30
002 2020-07-26 2020-08-24
002 2020-08-15 2020-09-13
002 2021-05-07 2021-06-06
;
data temp;
set have;
do date=start_date to end_date;
output;
end;
format date yymmdd10.;
keep id date;
run;
proc freq data=temp noprint;
table id*date/out=temp2 list;
run;
data temp3;
set temp2;
flag=ifn(count=1,1,0);
keep id date flag;
run;
data temp4;
set temp3;
by id flag notsorted;
if first.id or dif(date) ne 1 then group1+1;
if first.flag or dif(date) ne 1 then group2+1;
run;
proc sql;
create table temp5 as
select distinct group1,group2,id,flag,
min(date) as start_date format=yymmdd10.,max(date) as end_date format=yymmdd10.,
case when flag=0 then max(date)-min(date)+1 else . end as count
from temp4
group by group1,group2;
create table want as
select distinct group1,id,
min(start_date) as start_date format=yymmdd10.,sum(max(end_date),sum(count)-1) as end_date format=yymmdd10.
from temp5
group by group1;
quit;
Calculating the overlap of date/time intervals
By Leonid Batkhan on SAS Users January 13, 2022
https://blogs.sas.com/content/sgf/2022/01/13/calculating-the-overlap-of-date-time-intervals/
Koen
data have;
infile cards expandtabs;
input ID $ (start_date end_date) (:yymmdd10.);
format start_date end_date yymmdd10.;
cards;
001 2020-01-10 2020-02-08
001 2020-02-09 2020-03-09
001 2020-03-01 2020-03-30
001 2021-05-07 2021-06-06
002 2020-07-01 2020-07-30
002 2020-07-26 2020-08-24
002 2020-08-15 2020-09-13
002 2021-05-07 2021-06-06
;
data temp;
set have;
do date=start_date to end_date;
output;
end;
format date yymmdd10.;
keep id date;
run;
proc freq data=temp noprint;
table id*date/out=temp2 list;
run;
data temp3;
set temp2;
flag=ifn(count=1,1,0);
keep id date flag;
run;
data temp4;
set temp3;
by id flag notsorted;
if first.id or dif(date) ne 1 then group1+1;
if first.flag or dif(date) ne 1 then group2+1;
run;
proc sql;
create table temp5 as
select distinct group1,group2,id,flag,
min(date) as start_date format=yymmdd10.,max(date) as end_date format=yymmdd10.,
case when flag=0 then max(date)-min(date)+1 else . end as count
from temp4
group by group1,group2;
create table want as
select distinct group1,id,
min(start_date) as start_date format=yymmdd10.,sum(max(end_date),sum(count)-1) as end_date format=yymmdd10.
from temp5
group by group1;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.