Hello all,
I am trying to figure out how to identify overlapping date intervals within a dataset.
Below is what I have:
id Med Start_date End_date
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
Below is what I want:
id Med Start_date End_date
1 A 1/10/15 2/9/15
1 A+B 2/10/15 3/10/15
1 A 3/11/15 4/10/15
2 A 7/8/18 9/7/18
2 A+C 9/8/18 10/8/18
2 C 10/9/18 11/8/18
I am essentially trying to identify the periods for which id in which the meds have overlapping dates. I suspect the code is somewhat complicated for this... any advice?
Thanks!
This type of problem easily succumbs to the so-called "paint-brush" approach. Normally, the brush paints a key-indexed array table with one slot available per each possible date. It can be done this way here, too. However, in this case the need to store concatenated values (like A+B and such) makes a hash table a more convenient medium. To boot, it's dynamic, so there's no need to guess the size of its storage beforehand or allocate it as "big enough".
data have ; input (id med) (:$1.) (start_date end_date) (:mmddyy10.) ; format start_date end_date yymmdd10. ; cards ; 1 A 1/10/15 4/10/15 1 B 2/10/15 3/10/15 2 A 7/08/18 10/08/18 2 C 9/08/18 11/08/18 run ; data _null_ ; if _n_ = 1 then do ; dcl hash h (ordered:"a") ; h.definekey ("id", "_d") ; h.definedata ("id", "_d", "med") ; h.definedone () ; end ; set have (rename = med = _m) end = z ; length med $ 32767 ; * adjust as you see fit ; do _d = start_date to end_date ; if h.find() ne 0 then med = _m ; else med = catx ("+", med, _m) ; h.replace() ; end ; if z then h.output (dataset: "hash") ; run ; data want (drop = _:) ; do until (last.med) ; set hash ; by id med notsorted ; if first.med then start_date = _d ; end ; end_date = _d ; format start_date end_date yymmdd10. ; run ;
Note:
Kind regards
Paul D.
Hi @SarahW13
You can try this code:
data have;
input id Med $ Start_date End_date;
informat Start_date End_date MMDDYY8.;
format Start_date End_date MMDDYY8.;
datalines;
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
;
run;
/* Expand dataset: create one row per day for each drug */
data have_exp;
set have;
format day MMDDYY8.;
do day=Start_date to End_date;
output;
end;
run;
proc sort data=have_exp;
by id day;
run;
/* Transpose drugs to create combination */
proc transpose data=have_exp out=have_tr (drop=_name_);
var med;
by id day;
run;
data have_combi;
set have_tr;
combi = catx(" + ",of col:);
keep id day combi;
run;
/* Creation of the variable counter, which is incremented each time a new combination is given */
data have_combi2;
set have_combi;
by id combi notsorted;
if first.id then counter=0;
if first.combi then counter + 1;
run;
/* Creation of the final table */
proc sql;
create table want as
select id,
combi as Med,
min(day) as Start_date format=MMDDYY8.,
max(day) as End_date format=MMDDYY8.
from have_combi2
group by id, combi, counter
order by id, Start_date ;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.