BookmarkSubscribeRSS Feed
SarahW13
Obsidian | Level 7

Hello all,

 

I am trying to figure out how to identify overlapping date intervals within a dataset.

 

Below is what I have:

 

id        Med          Start_date          End_date

1         A               1/10/15              4/10/15

1         B               2/10/15              3/10/15

2         A               7/8/18                10/8/18

2         C               9/8/18                11/8/18

 

Below is what I want:

 

id        Med          Start_date          End_date

1         A               1/10/15              2/9/15

1         A+B          2/10/15              3/10/15

1         A               3/11/15              4/10/15

2         A               7/8/18                9/7/18

2         A+C          9/8/18                10/8/18

2         C              10/9/18               11/8/18

 

I am essentially trying to identify the periods for which id in which the meds have overlapping dates. I suspect the code is somewhat complicated for this... any advice?

 

Thanks!

2 REPLIES 2
hashman
Ammonite | Level 13

@SarahW13:

This type of problem easily succumbs to the so-called "paint-brush" approach. Normally, the brush paints a key-indexed array table with one slot available per each possible date. It can be done this way here, too. However, in this case the need to store concatenated values (like A+B and such) makes a hash table a more convenient medium. To boot, it's dynamic, so there's no need to guess the size of its storage beforehand or allocate it as "big enough".

data have ;                                                                                                                             
  input (id med) (:$1.) (start_date end_date) (:mmddyy10.) ;                                                                            
  format start_date end_date yymmdd10. ;                                                                                                
  cards ;                                                                                                                               
1  A  1/10/15   4/10/15                                                                                                                 
1  B  2/10/15   3/10/15                                                                                                                 
2  A  7/08/18  10/08/18                                                                                                                 
2  C  9/08/18  11/08/18                                                                                                                 
run ;                                                                                                                                   
                                                                                                                                        
data _null_ ;                                                                                                                           
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (ordered:"a") ;                                                                                                          
    h.definekey  ("id", "_d") ;                                                                                                         
    h.definedata ("id", "_d", "med") ;                                                                                                  
    h.definedone () ;                                                                                                                   
  end ;                                                                                                                                 
  set have (rename = med = _m) end = z ;                                                                                                
  length med $ 32767 ; * adjust as you see fit ;                                                                                        
  do _d = start_date to end_date ;                                                                                                      
    if h.find() ne 0 then med = _m ;                                                                                                    
    else med = catx ("+", med,  _m) ;                                                                                                   
    h.replace() ;                                                                                                                       
  end ;                                                                                                                                 
  if z then h.output (dataset: "hash") ;                                                                                                
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  do until (last.med) ;                                                                                                                 
    set hash ;                                                                                                                          
    by id med notsorted ;                                                                                                               
    if first.med then start_date = _d ;                                                                                                 
  end ;                                                                                                                                 
  end_date = _d ;                                                                                                                       
  format start_date end_date yymmdd10. ;                                                                                                
run ;                                          

Note:

  • The size for the MED field is chosen as the longest possible at $32767. Most likely, you don't need it that long, so adjust according to what you have in your data. 
  • The post-processing after the hash table is filled in the first step can be incorporated into the step. However, it's simpler to spit the hash data out and use the convenience of BY processing to get the final result.
  • The hash approach above makes no assumption that the input is sorted in any particular way; it renders the output sorted as needed internally due to the nature of the algorithm.

Kind regards

Paul D. 

ed_sas_member
Meteorite | Level 14

Hi @SarahW13 

 

You can try this code:

 

data have;
	input id Med $ Start_date End_date;
	informat Start_date End_date MMDDYY8.;
	format Start_date End_date MMDDYY8.;
	datalines;
1 A 1/10/15 4/10/15
1 B 2/10/15 3/10/15
2 A 7/8/18 10/8/18
2 C 9/8/18 11/8/18
;
run;

/* Expand dataset: create one row per day for each drug */

data have_exp;
	set have;
	format day MMDDYY8.;
	do day=Start_date to End_date;
		output;
	end;
run;

proc sort data=have_exp;
	by id day;
run;

/* Transpose drugs to create combination */

proc transpose data=have_exp out=have_tr (drop=_name_);
	var med;
	by id day;
run;

data have_combi;
	set have_tr;
	combi = catx(" + ",of col:);
	keep id day combi;
run;

/* Creation of the variable counter, which is incremented each time a new combination is given */

data have_combi2;
	set have_combi;
	
	by id combi notsorted;
	if first.id then counter=0;

	if first.combi then counter + 1;
run;

/* Creation of the final table */

proc sql;
	create table want as
	select id,
		combi as Med,
		min(day) as Start_date format=MMDDYY8.,
		max(day) as End_date format=MMDDYY8.
	from have_combi2
	group by id, combi, counter
	order by id, Start_date ;
quit;

Capture d’écran 2019-12-30 à 10.53.47.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 852 views
  • 1 like
  • 3 in conversation