Hello all,
I am trying to figure out how to identify overlapping date intervals within a dataset.
Below is what I have:
id Med Start_date End_date
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
Below is what I want:
id Med Start_date End_date
1 A 1/10/15 2/9/15
1 A+B 2/10/15 3/10/15
1 A 3/11/15 4/10/15
2 A 7/8/18 9/7/18
2 A+C 9/8/18 10/8/18
2 C 10/9/18 11/8/18
I am essentially trying to identify the periods for which id in which the meds have overlapping dates. I suspect the code is somewhat complicated for this... any advice?
Thanks!
This type of problem easily succumbs to the so-called "paint-brush" approach. Normally, the brush paints a key-indexed array table with one slot available per each possible date. It can be done this way here, too. However, in this case the need to store concatenated values (like A+B and such) makes a hash table a more convenient medium. To boot, it's dynamic, so there's no need to guess the size of its storage beforehand or allocate it as "big enough".
data have ; input (id med) (:$1.) (start_date end_date) (:mmddyy10.) ; format start_date end_date yymmdd10. ; cards ; 1 A 1/10/15 4/10/15 1 B 2/10/15 3/10/15 2 A 7/08/18 10/08/18 2 C 9/08/18 11/08/18 run ; data _null_ ; if _n_ = 1 then do ; dcl hash h (ordered:"a") ; h.definekey ("id", "_d") ; h.definedata ("id", "_d", "med") ; h.definedone () ; end ; set have (rename = med = _m) end = z ; length med $ 32767 ; * adjust as you see fit ; do _d = start_date to end_date ; if h.find() ne 0 then med = _m ; else med = catx ("+", med, _m) ; h.replace() ; end ; if z then h.output (dataset: "hash") ; run ; data want (drop = _:) ; do until (last.med) ; set hash ; by id med notsorted ; if first.med then start_date = _d ; end ; end_date = _d ; format start_date end_date yymmdd10. ; run ;
Note:
Kind regards
Paul D.
Hi @SarahW13
You can try this code:
data have;
input id Med $ Start_date End_date;
informat Start_date End_date MMDDYY8.;
format Start_date End_date MMDDYY8.;
datalines;
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
;
run;
/* Expand dataset: create one row per day for each drug */
data have_exp;
set have;
format day MMDDYY8.;
do day=Start_date to End_date;
output;
end;
run;
proc sort data=have_exp;
by id day;
run;
/* Transpose drugs to create combination */
proc transpose data=have_exp out=have_tr (drop=_name_);
var med;
by id day;
run;
data have_combi;
set have_tr;
combi = catx(" + ",of col:);
keep id day combi;
run;
/* Creation of the variable counter, which is incremented each time a new combination is given */
data have_combi2;
set have_combi;
by id combi notsorted;
if first.id then counter=0;
if first.combi then counter + 1;
run;
/* Creation of the final table */
proc sql;
create table want as
select id,
combi as Med,
min(day) as Start_date format=MMDDYY8.,
max(day) as End_date format=MMDDYY8.
from have_combi2
group by id, combi, counter
order by id, Start_date ;
quit;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.