Hi all,
I have multiple rows of admission dates (admission_dt) and discharge dates (discharge_dt) per person (identified by DummyID) and wanted to identify if the timeframe (between admission_dt and discharge_dt) across multiple rows have any overlapped days. Any suggestion is greatly appreciated!
Below is the sample data:
DummyID | admission_dt | discharge_dt |
1 | 1/1/2018 | 2/28/2018 |
1 | 1/1/2018 | |
1 | 1/27/2018 | 1/30/2018 |
1 | 3/15/2018 | |
1 | 4/1/2018 | |
1 | 4/7/2018 | |
1 | 5/1/2018 | |
1 | 6/1/2018 | |
1 | 7/1/2018 | |
1 | 8/1/2018 | |
1 | 9/1/2018 | |
1 | 10/1/2018 | |
1 | 11/1/2018 | |
1 | 12/1/2018 | |
2 | 1/18/2018 | 5/22/2018 |
2 | 3/11/2018 | 4/30/2018 |
2 | 4/11/2018 | 6/22/2018 |
If missing discharge_dt means the individual has not been discharged, then you need to provide an "end_of_study" date: Also, for the sample data you provided, there will be no days without overlap for dummyid=1.
This program creates an array indexed by the range of dates for your study. It then populates each date with the count of obs having that date. Then all one has to do is step through the completed date history of counts:
data have;
infile datalines truncover dsd;
input DummyID (admission_dt discharge_dt) (:mmddyy.);
format admission_dt discharge_dt date9.;
datalines;
1,1/1/2018,2/28/2018
1,1/1/2018,
1,1/27/2018,1/30/2018
1,3/15/2018,
1,4/1/2018,
1,4/7/2018,
1,5/1/2018,
1,6/1/2018,
1,7/1/2018,
1,8/1/2018,
1,9/1/2018,
1,10/1/2018,
1,11/1/2018,
1,12/1/2018,
2,1/18/2018,5/22/2018
2,3/11/2018,4/30/2018
2,4/11/2018,6/22/2018
;
%let beg_study=01jan2018;
%let end_study=31dec2018;
data want (keep=dummyid overlap_level beg_phase end_phase);
array _overlap_history {%sysevalf("&beg_study"d):%eval(1+%sysevalf("&end_study"d))} ;
do until (last.dummyid);
set have;
by dummyid;
if discharge_dt=. then discharge_dt="&end_study"d;
_mind=min(admission_dt,_mind);
_maxd=max(discharge_dt,_maxd);
do d=admission_dt to discharge_dt;
_overlap_history{d}=sum(_overlap_history{d},1);
end;
end;
format beg_phase end_phase _mind _maxd date9.;
end_phase=_mind-1;
do until (end_phase=_maxd);
beg_phase=end_phase+1;
overlap_level=_overlap_history{beg_phase};
do end_phase=beg_phase to "&end_study"d until(_overlap_history{end_phase+1}^=overlap_level);
end;
output;
end;
run;
BTW, if you are so inclined, you can replace the two statements
end_phase=_mind-1;
do until (end_phase=_maxd);
with
do end_phase=_mind-1 by 0 until (end_phase=_maxd);
Note the array is defined to go one day beyond END_STUDY. This is to support the until condition in
do end_phase=beg_phase to "&end_study"d-1 until(_overlap_history{end_phase+1}^=overlap_level);
all the way through end_phase="*end_study"d.
I wasn't sure what missing discharge days mean so I've excluded these rows. If the meaning of missings is not yet discharged then all rows would overlap in your sample data.
One of below two options below should "show you the way".
data have;
infile datalines truncover dsd;
input DummyID (admission_dt discharge_dt) (:mmddyy.);
format admission_dt discharge_dt date9.;
datalines;
1,1/1/2018,2/28/2018
1,1/1/2018,
1,1/27/2018,1/30/2018
1,3/15/2018,
1,4/1/2018,
1,4/7/2018,
1,5/1/2018,
1,6/1/2018,
1,7/1/2018,
1,8/1/2018,
1,9/1/2018,
1,10/1/2018,
1,11/1/2018,
1,12/1/2018,
2,1/18/2018,5/22/2018
2,3/11/2018,4/30/2018
2,4/11/2018,6/22/2018
;
proc sql;
select
t1.DummyID,
t1.admission_dt,
t1.discharge_dt,
t2.admission_dt as t2_admission_dt format=date9.,
t2.discharge_dt as t2_discharge_dt format=date9.
from
have t1
left join
have t2
on
not missing (t1.discharge_dt)
and
(
t2.admission_dt < t1.admission_dt < t2.discharge_dt
or
t2.admission_dt < t1.discharge_dt < t2.discharge_dt
)
order by t1.DummyID, t1.admission_dt
;
quit;
proc sql;
select
t1.DummyID,
t1.admission_dt,
t1.discharge_dt,
count(*) as n_overlaps
from
have t1
left join
have t2
on
not missing (t1.discharge_dt)
and
(
t2.admission_dt < t1.admission_dt < t2.discharge_dt
or
t2.admission_dt < t1.discharge_dt < t2.discharge_dt
)
group by t1.DummyID, t1.admission_dt, t1.discharge_dt
;
quit;
If missing discharge_dt means the individual has not been discharged, then you need to provide an "end_of_study" date: Also, for the sample data you provided, there will be no days without overlap for dummyid=1.
This program creates an array indexed by the range of dates for your study. It then populates each date with the count of obs having that date. Then all one has to do is step through the completed date history of counts:
data have;
infile datalines truncover dsd;
input DummyID (admission_dt discharge_dt) (:mmddyy.);
format admission_dt discharge_dt date9.;
datalines;
1,1/1/2018,2/28/2018
1,1/1/2018,
1,1/27/2018,1/30/2018
1,3/15/2018,
1,4/1/2018,
1,4/7/2018,
1,5/1/2018,
1,6/1/2018,
1,7/1/2018,
1,8/1/2018,
1,9/1/2018,
1,10/1/2018,
1,11/1/2018,
1,12/1/2018,
2,1/18/2018,5/22/2018
2,3/11/2018,4/30/2018
2,4/11/2018,6/22/2018
;
%let beg_study=01jan2018;
%let end_study=31dec2018;
data want (keep=dummyid overlap_level beg_phase end_phase);
array _overlap_history {%sysevalf("&beg_study"d):%eval(1+%sysevalf("&end_study"d))} ;
do until (last.dummyid);
set have;
by dummyid;
if discharge_dt=. then discharge_dt="&end_study"d;
_mind=min(admission_dt,_mind);
_maxd=max(discharge_dt,_maxd);
do d=admission_dt to discharge_dt;
_overlap_history{d}=sum(_overlap_history{d},1);
end;
end;
format beg_phase end_phase _mind _maxd date9.;
end_phase=_mind-1;
do until (end_phase=_maxd);
beg_phase=end_phase+1;
overlap_level=_overlap_history{beg_phase};
do end_phase=beg_phase to "&end_study"d until(_overlap_history{end_phase+1}^=overlap_level);
end;
output;
end;
run;
BTW, if you are so inclined, you can replace the two statements
end_phase=_mind-1;
do until (end_phase=_maxd);
with
do end_phase=_mind-1 by 0 until (end_phase=_maxd);
Note the array is defined to go one day beyond END_STUDY. This is to support the until condition in
do end_phase=beg_phase to "&end_study"d-1 until(_overlap_history{end_phase+1}^=overlap_level);
all the way through end_phase="*end_study"d.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.