BookmarkSubscribeRSS Feed
SarahW13
Obsidian | Level 7

Hello all,

 

I am trying to figure out how to identify overlapping date intervals within a dataset.

 

Below is what I have:

 

id        Med          Start_date          End_date

1         A               1/10/15              4/10/15

1         B               2/10/15              3/10/15

2         A               7/8/18                10/8/18

2         C               9/8/18                11/8/18

 

Below is what I want:

 

id        Med          Start_date          End_date

1         A               1/10/15              2/9/15

1         A+B          2/10/15              3/10/15

1         A               3/11/15              4/10/15

2         A               7/8/18                9/7/18

2         A+C          9/8/18                10/8/18

2         C              10/9/18               11/8/18

 

I am essentially trying to identify the periods for which id in which the meds have overlapping dates. I suspect the code is somewhat complicated for this... any advice?

 

Thanks!

2 REPLIES 2
hashman
Ammonite | Level 13

@SarahW13:

This type of problem easily succumbs to the so-called "paint-brush" approach. Normally, the brush paints a key-indexed array table with one slot available per each possible date. It can be done this way here, too. However, in this case the need to store concatenated values (like A+B and such) makes a hash table a more convenient medium. To boot, it's dynamic, so there's no need to guess the size of its storage beforehand or allocate it as "big enough".

data have ;                                                                                                                             
  input (id med) (:$1.) (start_date end_date) (:mmddyy10.) ;                                                                            
  format start_date end_date yymmdd10. ;                                                                                                
  cards ;                                                                                                                               
1  A  1/10/15   4/10/15                                                                                                                 
1  B  2/10/15   3/10/15                                                                                                                 
2  A  7/08/18  10/08/18                                                                                                                 
2  C  9/08/18  11/08/18                                                                                                                 
run ;                                                                                                                                   
                                                                                                                                        
data _null_ ;                                                                                                                           
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (ordered:"a") ;                                                                                                          
    h.definekey  ("id", "_d") ;                                                                                                         
    h.definedata ("id", "_d", "med") ;                                                                                                  
    h.definedone () ;                                                                                                                   
  end ;                                                                                                                                 
  set have (rename = med = _m) end = z ;                                                                                                
  length med $ 32767 ; * adjust as you see fit ;                                                                                        
  do _d = start_date to end_date ;                                                                                                      
    if h.find() ne 0 then med = _m ;                                                                                                    
    else med = catx ("+", med,  _m) ;                                                                                                   
    h.replace() ;                                                                                                                       
  end ;                                                                                                                                 
  if z then h.output (dataset: "hash") ;                                                                                                
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  do until (last.med) ;                                                                                                                 
    set hash ;                                                                                                                          
    by id med notsorted ;                                                                                                               
    if first.med then start_date = _d ;                                                                                                 
  end ;                                                                                                                                 
  end_date = _d ;                                                                                                                       
  format start_date end_date yymmdd10. ;                                                                                                
run ;                                          

Note:

  • The size for the MED field is chosen as the longest possible at $32767. Most likely, you don't need it that long, so adjust according to what you have in your data. 
  • The post-processing after the hash table is filled in the first step can be incorporated into the step. However, it's simpler to spit the hash data out and use the convenience of BY processing to get the final result.
  • The hash approach above makes no assumption that the input is sorted in any particular way; it renders the output sorted as needed internally due to the nature of the algorithm.

Kind regards

Paul D. 

ed_sas_member
Meteorite | Level 14

Hi @SarahW13 

 

You can try this code:

 

data have;
	input id Med $ Start_date End_date;
	informat Start_date End_date MMDDYY8.;
	format Start_date End_date MMDDYY8.;
	datalines;
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
;
run;

/* Expand dataset: create one row per day for each drug */

data have_exp;
	set have;
	format day MMDDYY8.;
	do day=Start_date to End_date;
		output;
	end;
run;

proc sort data=have_exp;
	by id day;
run;

/* Transpose drugs to create combination */

proc transpose data=have_exp out=have_tr (drop=_name_);
	var med;
	by id day;
run;

data have_combi;
	set have_tr;
	combi = catx(" + ",of col:);
	keep id day combi;
run;

/* Creation of the variable counter, which is incremented each time a new combination is given */

data have_combi2;
	set have_combi;
	
	by id combi notsorted;
	if first.id then counter=0;

	if first.combi then counter + 1;
run;

/* Creation of the final table */

proc sql;
	create table want as
	select id,
		combi as Med,
		min(day) as Start_date format=MMDDYY8.,
		max(day) as End_date format=MMDDYY8.
	from have_combi2
	group by id, combi, counter
	order by id, Start_date ;
quit;

Capture d’écran 2019-12-30 à 10.53.47.png

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1033 views
  • 1 like
  • 3 in conversation