BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10
start date  drug end_Date 
01/01/2005 a     02/01/2005
02/02/2005 a    03/01/2005
03/08/2005  a      04/08/2005
06/01/2005  a     07/01/2005

I want to create a dataset were I would have one start date and end date based on a gap of no longer than 7 days between the end of the first period and the start of the second period. For the example above, I would merge all the first three and ignore the four

output 

start date  drug end_Date

01/01/2005    a     04/08/2005

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

data one;
informat start_date end_date mmddyy.;
input start_date drug $1. end_Date ;
datalines;
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005
;
run;
data temp;
 set one;
 by drug;
 if first.drug then group=0;
 if first.drug or start_date-lag(end_date) gt 7 then group+1;
 if group=1;
run;
data want;
 set temp;
 by drug;
 retain start;
 if first.drug then start=start_date;
 if last.drug then output;
 drop group start_date;
 format start end_date date9.;
run;
  
 

View solution in original post

9 REPLIES 9
GreggB
Pyrite | Level 9

data one;
informat start_date end_date mmddyy.;
input start_date drug $1. end_Date ;
datalines;
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005
;
run;
proc sort;
by drug start_date end_date;
run;
data two;
set one;
by drug;
if first.drug then newstartdate=start_date;
if last.drug then newenddate=end_date;
proc print;
run;
data _start;
set two;
if newstartdate ne .;
data _end;
set two;
if newenddate ne .;
run;
proc sql;
create table join1 as select
s.drug, s.newstartdate, e.newenddate from _start as s
join _end as e
on s.drug=e.drug;
quit;
proc print;
format newstartdate newenddate mmddyy.;
run;

lillymaginta
Lapis Lazuli | Level 10

Hi Thank you for the reply. Your code merge all the days together and does not produce the output above. The last entry for the date should not be counted because the end_Date for the period before and the start date of the second period have a gap of more than 7 days. so if the gap is more than 7 days, record the end date for the one before which is 04/08/2005. Can you please adjust your code to account for that? 

Shmuel
Garnet | Level 18

data have;

infile cards;

input start_date ddmmyy10. drug $1. end_date ddmmyy10.;

cards;

start_date drug end_Date 
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005

; run; 

 

proc sort data=have; by drug start_date; run;

data want(rename = (date1 = start_date date2 = end_date));

  set have;  

    by drug;

       retain date1 date2;

       if first.drug the do;

          date1 = start_date;

          date2 = end_date;

      end; 

      else 

      if start_date = date2 + 1 then date2 = end_date;

      else do;              

              output;

              date1 = start_date;

              date2 = end_date;

              return;

      end; 

      if last.drug then output;

      keep date1 date2;

run;

 

I have not run my code. In case of any difficulty please post the log, to help you more. 

                 

lillymaginta
Lapis Lazuli | Level 10

Thank you for posting but you code does not produce the required output 

Ksharp
Super User

data one;
informat start_date end_date mmddyy.;
input start_date drug $1. end_Date ;
datalines;
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005
;
run;
data temp;
 set one;
 by drug;
 if first.drug then group=0;
 if first.drug or start_date-lag(end_date) gt 7 then group+1;
 if group=1;
run;
data want;
 set temp;
 by drug;
 retain start;
 if first.drug then start=start_date;
 if last.drug then output;
 drop group start_date;
 format start end_date date9.;
run;
  
 

lillymaginta
Lapis Lazuli | Level 10

Thank you Ksharp, always a life saver! 

lillymaginta
Lapis Lazuli | Level 10
if you have multiple ids, and you want to run the above code and generate the above output at the same time for multiple ids, how do you adjust your code for that?
lillymaginta
Lapis Lazuli | Level 10
data one;
informat start_date end_date mmddyy.;
input id start_date drug $1. end_Date ;
datalines;
1  01/01/2005 a 02/01/2005
1  02/02/2005 a 03/01/2005
1  03/08/2005 a 04/08/2005
1  06/01/2005 a 07/01/2005
2  01/01/2005 a 02/01/2005
2  02/02/2005 a 03/01/2005
2  03/08/2005 a 04/08/2005
2 06/01/2005 a 07/01/2005
3  01/01/2005 a 02/02/2005


;
run;

also if the patient has one period with no other start date within 7 days (such as in patient 3), the code should assign that period as the start and end date for the output. 

Thank you! 

Ksharp
Super User

data one;
informat start_date end_date mmddyy.;
input id start_date drug $1. end_Date ;
datalines;
1  01/01/2005 a 02/01/2005
1  02/02/2005 a 03/01/2005
1  03/08/2005 a 04/08/2005
1  06/01/2005 a 07/01/2005
2  01/01/2005 a 02/01/2005
2  02/02/2005 a 03/01/2005
2  03/08/2005 a 04/08/2005
2 06/01/2005 a 07/01/2005
3  01/01/2005 a 02/02/2005
;
run;


data temp;
 set one;
 by drug id;
 if first.id then group=0;
 if first.id or start_date-lag(end_date) gt 7 then group+1;
 if group=1;
run;
data want;
 set temp;
 by drug id;
 retain start;
 if first.id then start=start_date;
 if last.id then output;
 drop group start_date;
 format start end_date date9.;
run;
  

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3088 views
  • 1 like
  • 4 in conversation