topic Re: Identify overlapping date intervals in SAS Programming

Identify overlapping date intervals

SarahW13 — Sun, 29 Dec 2019 20:18:29 GMT

Hello all,

I am trying to figure out how to identify overlapping date intervals within a dataset.

Below is what I have:

id Med Start_date End_date

1 A 1/10/15 4/10/15

1 B 2/10/15 3/10/15

2 A 7/8/18 10/8/18

2 C 9/8/18 11/8/18

Below is what I want:

id Med Start_date End_date

1 A 1/10/15 2/9/15

1 A+B 2/10/15 3/10/15

1 A 3/11/15 4/10/15

2 A 7/8/18 9/7/18

2 A+C 9/8/18 10/8/18

2 C 10/9/18 11/8/18

I am essentially trying to identify the periods for which id in which the meds have overlapping dates. I suspect the code is somewhat complicated for this... any advice?

Thanks!

Re: Identify overlapping date intervals

hashman — Mon, 30 Dec 2019 01:05:12 GMT

@SarahW13:

This type of problem easily succumbs to the so-called "paint-brush" approach. Normally, the brush paints a key-indexed array table with one slot available per each possible date. It can be done this way here, too. However, in this case the need to store concatenated values (like A+B and such) makes a hash table a more convenient medium. To boot, it's dynamic, so there's no need to guess the size of its storage beforehand or allocate it as "big enough".

data have ;                                                                                                                             
  input (id med) (:$1.) (start_date end_date) (:mmddyy10.) ;                                                                            
  format start_date end_date yymmdd10. ;                                                                                                
  cards ;                                                                                                                               
1  A  1/10/15   4/10/15                                                                                                                 
1  B  2/10/15   3/10/15                                                                                                                 
2  A  7/08/18  10/08/18                                                                                                                 
2  C  9/08/18  11/08/18                                                                                                                 
run ;                                                                                                                                   
                                                                                                                                        
data _null_ ;                                                                                                                           
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (ordered:"a") ;                                                                                                          
    h.definekey  ("id", "_d") ;                                                                                                         
    h.definedata ("id", "_d", "med") ;                                                                                                  
    h.definedone () ;                                                                                                                   
  end ;                                                                                                                                 
  set have (rename = med = _m) end = z ;                                                                                                
  length med $ 32767 ; * adjust as you see fit ;                                                                                        
  do _d = start_date to end_date ;                                                                                                      
    if h.find() ne 0 then med = _m ;                                                                                                    
    else med = catx ("+", med,  _m) ;                                                                                                   
    h.replace() ;                                                                                                                       
  end ;                                                                                                                                 
  if z then h.output (dataset: "hash") ;                                                                                                
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  do until (last.med) ;                                                                                                                 
    set hash ;                                                                                                                          
    by id med notsorted ;                                                                                                               
    if first.med then start_date = _d ;                                                                                                 
  end ;                                                                                                                                 
  end_date = _d ;                                                                                                                       
  format start_date end_date yymmdd10. ;                                                                                                
run ;

Note:

The size for the MED field is chosen as the longest possible at $32767. Most likely, you don't need it that long, so adjust according to what you have in your data.
The post-processing after the hash table is filled in the first step can be incorporated into the step. However, it's simpler to spit the hash data out and use the convenience of BY processing to get the final result.
The hash approach above makes no assumption that the input is sorted in any particular way; it renders the output sorted as needed internally due to the nature of the algorithm.

Kind regards

Paul D.

Re: Identify overlapping date intervals

ed_sas_member — Mon, 30 Dec 2019 09:54:18 GMT

Hi @SarahW13

You can try this code:

data have;
	input id Med $ Start_date End_date;
	informat Start_date End_date MMDDYY8.;
	format Start_date End_date MMDDYY8.;
	datalines;
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
;
run;

/* Expand dataset: create one row per day for each drug */

data have_exp;
	set have;
	format day MMDDYY8.;
	do day=Start_date to End_date;
		output;
	end;
run;

proc sort data=have_exp;
	by id day;
run;

/* Transpose drugs to create combination */

proc transpose data=have_exp out=have_tr (drop=_name_);
	var med;
	by id day;
run;

data have_combi;
	set have_tr;
	combi = catx(" + ",of col:);
	keep id day combi;
run;

/* Creation of the variable counter, which is incremented each time a new combination is given */

data have_combi2;
	set have_combi;
	
	by id combi notsorted;
	if first.id then counter=0;

	if first.combi then counter + 1;
run;

/* Creation of the final table */

proc sql;
	create table want as
	select id,
		combi as Med,
		min(day) as Start_date format=MMDDYY8.,
		max(day) as End_date format=MMDDYY8.
	from have_combi2
	group by id, combi, counter
	order by id, Start_date ;
quit;