BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DathanMD
Obsidian | Level 7

Dealing with overlaps dates.

 

Hi mates;

I am junior to SAS programing and would like help with some code. So, I have a dataset where there are multiple lines per ID (<patient>), with dates indicating exposure during a period. Some of the dates overlap, some do not. Some dates are completely within the time of earlier or later datelines, while some only overlap certain months. Also, for some patients there may be gaps before becoming exposed again.

 

What I would like to do is combine the overlapping periods so that I keep the earliest start and latest end dates for intervals that are continuously overlapping. I would also like to keep the combined drug combinations. However, if there is a gap (even if just one day) between the intervals, I would like to keep them separate. Therefore, people can still have multiple lines if they have multiple gaps. I just want to make sure I don't attribute non-exposure periods as being exposed.

 

 

Each drug (<drug>) is entered separately with its start date (<start_date) and end/stop date (end_date)-see table have. I would like to show periods when the patients were taking distinct drug combinations, and this may overrun the individual drug start and end dates (table want).

. have

patient

start_date

end_date

drug

1

19-Mar-98

26-Jul-98

amox

1

19-Mar-98

18-Nov-98

cipro

1

19-Mar-98

1-Jun-05

zinc

1

19-Mar-98

15-Feb-09

flagyl

1

27-Jul-98

24-Aug-05

cef

2

19-Nov-98

9-Jul-03

amox

2

10-Jul-03

31-Jul-06

cipro

2

2-Jun-05

15-Feb-09

zinc

2

1-Aug-06

16-Jul-18

flagyl

3

16-Feb-09

16-Jul-18

cef

3

16-Feb-09

25-Aug-19

sec

 

Want

patient

start_date

end_date

drug

1

19-Mar-98

26-Jul-98

amox, cipro,zinc,flagyl

1

27-Jul-98

18-Nov-98

cef,cipro,zinc,flagyl

1

19-Nov-98

1-Jun-05

cef,zinc,flagyl

1

2-Jun-05

24-Aug-05

cef,flagyl

1

25-Aug-05

15-Feb-09

flagyl

2

19-Nov-98

9-Jul-03

amox

2

10-Jul-03

1-Jun-05

cipro

2

2-Jun-05

31-Jul-06

cipro,zinc

2

1-Aug-06

15-Feb-09

flagyl,zinc

2

17-Feb-09

16-Jul-18

flagyl

3

17-Feb-09

16-Jul-18

cef,sec

3

17-Jul-18

25-Aug-19

sec

 

Thanks a lot

 

Dathan Byonanebye

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Another approach to this task is to consider a series of dated events (with variable _DATE), where one type of event is the start of use for a drug (having _DATE=START_DATE), and the other type of event is the start of disuse for a drug (having _DATE=END_DATE+1).

 

From an intermediate dataset sorted by PATIENT/_DATE, every time a new _DATE is encountered, it will eventually become a START_DATE.  But before setting the START_DATE, output the previously accumulated list of DRUGs, with a previously established START_DATE, and from the current record END_DATE=_DATE-1

 

After the output statement, set START_DATE=_DATE (which will become the "previously established START_DATE for the upcoming data).  And update the drug list as by adding or deleting as indicated.  This sequence, applicable both for adding and removing drugs, is why I use END_DATE+1 (instead of END_DATE) in the intermediate data set.  

 

data have;
input patient
start_date : date11. end_date : date9. drug :$8. ;
format start_date end_date date11.;
cards;
1 19-Mar-98 26-Jul-98 amox
1 19-Mar-98 18-Nov-98 cipro
1 19-Mar-98  1-Jun-05 zinc
1 19-Mar-98 15-Feb-09 flagyl
1 27-Jul-98 24-Aug-05 cef
2 19-Nov-98  9-Jul-03 amox
2 10-Jul-03 31-Jul-06 cipro
2  2-Jun-05 15-Feb-09 zinc
2  1-Aug-06 16-Jul-18 flagyl
3 16-Feb-09 16-Jul-18 cef
3 16-Feb-09 25-Aug-19 sec
run;
data vtemp (drop=start_date end_date) /view=vtemp ;
  set have (rename=(drug=_drg));
  _action=+1;  _date=start_date;   output;
  _action=-1;  _date=end_date+1;   output;
  format _date date9.;
run;
proc sort data=vtemp out=events;
  by patient _date;
run;
data want (drop=_:);
  if 0 then set have (drop=drug);
  set events ;
  by patient _date;

  length drug $40;
  retain drug ;
  if first._date then do;
    end_date=_date-1;
    if drug^=' ' then output;
    start_date=_date;
  end;

  array _drgs{20} $8 _temporary_;
  if _action=1 then _drgs{1}=_drg;
  else if _action=-1 then do _i=dim(_drgs) to 1 by -1 until(_drgs{_i}=' ');
    if _drgs{_i}=_drg then _drgs{_i}=' ';
  end;
  call sortc(of _drgs{*});
  drug=catx(',',of _drgs{*});
run;

I use the "if 0 then SET HAVE ..." statement to get all the formats (and labels) for variables in HAVE, even though the "obs=0" precludes actual reading of data from HAVE.  And because you want DRUG to have a list (I set to length $40), I prevent that variable from preserving the shorter length it probably has in HAVE.

 

Edited note: The reason the "if _action=-1 ..." (for the "start of disuse condition") do loop starts at the end of the _drgs array and works backwards looking for a match to the newly-disused _drg is because the array would have been previously sorted, meaning all the blanks would be first, followed by the list of _drg values in lexicographic order.  So all the potential matches for the _drg to be dropped from the list will be at the upper portion of the array.

 

Also be sure to make the _drgs array large enough to hold the longest possible list of drugs.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
Ksharp
Super User
data have;
input patient
start_date : date11.
end_date : date11.
drug $
;
format start_date end_date date11.;
cards;
1

19-Mar-98

26-Jul-98

amox

1

19-Mar-98

18-Nov-98

cipro

1

19-Mar-98

1-Jun-05

zinc

1

19-Mar-98

15-Feb-09

flagyl

1

27-Jul-98

24-Aug-05

cef

2

19-Nov-98

9-Jul-03

amox

2

10-Jul-03

31-Jul-06

cipro

2

2-Jun-05

15-Feb-09

zinc

2

1-Aug-06

16-Jul-18

flagyl

3

16-Feb-09

16-Jul-18

cef

3

16-Feb-09

25-Aug-19

sec
;

data temp;
 set have;
 do date=start_date to end_date;
  output;
 end;
 drop start_date end_date;
 format date date11.;
run;
proc sort data=temp out=temp1 nodupkey;
by patient date drug;
run;
data temp2;
 do until(last.date);
  set temp1;
  by patient date;
  length want $ 200;
  want=catx(',',want,drug);
 end;
 drop drug;
run;
data want;
do until(last.want);
 set temp2;
 by patient want notsorted;
 if first.want then start_date=date;
end;
 end_date=date;
 drop date;
 format start_date end_date date11.;
run;
mkeintz
PROC Star

Another approach to this task is to consider a series of dated events (with variable _DATE), where one type of event is the start of use for a drug (having _DATE=START_DATE), and the other type of event is the start of disuse for a drug (having _DATE=END_DATE+1).

 

From an intermediate dataset sorted by PATIENT/_DATE, every time a new _DATE is encountered, it will eventually become a START_DATE.  But before setting the START_DATE, output the previously accumulated list of DRUGs, with a previously established START_DATE, and from the current record END_DATE=_DATE-1

 

After the output statement, set START_DATE=_DATE (which will become the "previously established START_DATE for the upcoming data).  And update the drug list as by adding or deleting as indicated.  This sequence, applicable both for adding and removing drugs, is why I use END_DATE+1 (instead of END_DATE) in the intermediate data set.  

 

data have;
input patient
start_date : date11. end_date : date9. drug :$8. ;
format start_date end_date date11.;
cards;
1 19-Mar-98 26-Jul-98 amox
1 19-Mar-98 18-Nov-98 cipro
1 19-Mar-98  1-Jun-05 zinc
1 19-Mar-98 15-Feb-09 flagyl
1 27-Jul-98 24-Aug-05 cef
2 19-Nov-98  9-Jul-03 amox
2 10-Jul-03 31-Jul-06 cipro
2  2-Jun-05 15-Feb-09 zinc
2  1-Aug-06 16-Jul-18 flagyl
3 16-Feb-09 16-Jul-18 cef
3 16-Feb-09 25-Aug-19 sec
run;
data vtemp (drop=start_date end_date) /view=vtemp ;
  set have (rename=(drug=_drg));
  _action=+1;  _date=start_date;   output;
  _action=-1;  _date=end_date+1;   output;
  format _date date9.;
run;
proc sort data=vtemp out=events;
  by patient _date;
run;
data want (drop=_:);
  if 0 then set have (drop=drug);
  set events ;
  by patient _date;

  length drug $40;
  retain drug ;
  if first._date then do;
    end_date=_date-1;
    if drug^=' ' then output;
    start_date=_date;
  end;

  array _drgs{20} $8 _temporary_;
  if _action=1 then _drgs{1}=_drg;
  else if _action=-1 then do _i=dim(_drgs) to 1 by -1 until(_drgs{_i}=' ');
    if _drgs{_i}=_drg then _drgs{_i}=' ';
  end;
  call sortc(of _drgs{*});
  drug=catx(',',of _drgs{*});
run;

I use the "if 0 then SET HAVE ..." statement to get all the formats (and labels) for variables in HAVE, even though the "obs=0" precludes actual reading of data from HAVE.  And because you want DRUG to have a list (I set to length $40), I prevent that variable from preserving the shorter length it probably has in HAVE.

 

Edited note: The reason the "if _action=-1 ..." (for the "start of disuse condition") do loop starts at the end of the _drgs array and works backwards looking for a match to the newly-disused _drg is because the array would have been previously sorted, meaning all the blanks would be first, followed by the list of _drg values in lexicographic order.  So all the potential matches for the _drg to be dropped from the list will be at the upper portion of the array.

 

Also be sure to make the _drgs array large enough to hold the longest possible list of drugs.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DathanMD
Obsidian | Level 7

Thanks so much. This works 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1386 views
  • 1 like
  • 3 in conversation