start date drug end_Date
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005
I want to create a dataset were I would have one start date and end date based on a gap of no longer than 7 days between the end of the first period and the start of the second period. For the example above, I would merge all the first three and ignore the four
output
start date drug end_Date
01/01/2005 a 04/08/2005
data one; informat start_date end_date mmddyy.; input start_date drug $1. end_Date ; datalines; 01/01/2005 a 02/01/2005 02/02/2005 a 03/01/2005 03/08/2005 a 04/08/2005 06/01/2005 a 07/01/2005 ; run; data temp; set one; by drug; if first.drug then group=0; if first.drug or start_date-lag(end_date) gt 7 then group+1; if group=1; run; data want; set temp; by drug; retain start; if first.drug then start=start_date; if last.drug then output; drop group start_date; format start end_date date9.; run;
data one;
informat start_date end_date mmddyy.;
input start_date drug $1. end_Date ;
datalines;
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005
;
run;
proc sort;
by drug start_date end_date;
run;
data two;
set one;
by drug;
if first.drug then newstartdate=start_date;
if last.drug then newenddate=end_date;
proc print;
run;
data _start;
set two;
if newstartdate ne .;
data _end;
set two;
if newenddate ne .;
run;
proc sql;
create table join1 as select
s.drug, s.newstartdate, e.newenddate from _start as s
join _end as e
on s.drug=e.drug;
quit;
proc print;
format newstartdate newenddate mmddyy.;
run;
Hi Thank you for the reply. Your code merge all the days together and does not produce the output above. The last entry for the date should not be counted because the end_Date for the period before and the start date of the second period have a gap of more than 7 days. so if the gap is more than 7 days, record the end date for the one before which is 04/08/2005. Can you please adjust your code to account for that?
data have;
infile cards;
input start_date ddmmyy10. drug $1. end_date ddmmyy10.;
cards;
start_date drug end_Date
01/01/2005 a 02/01/2005
02/02/2005 a 03/01/2005
03/08/2005 a 04/08/2005
06/01/2005 a 07/01/2005
; run;
proc sort data=have; by drug start_date; run;
data want(rename = (date1 = start_date date2 = end_date));
set have;
by drug;
retain date1 date2;
if first.drug the do;
date1 = start_date;
date2 = end_date;
end;
else
if start_date = date2 + 1 then date2 = end_date;
else do;
output;
date1 = start_date;
date2 = end_date;
return;
end;
if last.drug then output;
keep date1 date2;
run;
I have not run my code. In case of any difficulty please post the log, to help you more.
Thank you for posting but you code does not produce the required output
data one; informat start_date end_date mmddyy.; input start_date drug $1. end_Date ; datalines; 01/01/2005 a 02/01/2005 02/02/2005 a 03/01/2005 03/08/2005 a 04/08/2005 06/01/2005 a 07/01/2005 ; run; data temp; set one; by drug; if first.drug then group=0; if first.drug or start_date-lag(end_date) gt 7 then group+1; if group=1; run; data want; set temp; by drug; retain start; if first.drug then start=start_date; if last.drug then output; drop group start_date; format start end_date date9.; run;
Thank you Ksharp, always a life saver!
data one;
informat start_date end_date mmddyy.;
input id start_date drug $1. end_Date ;
datalines;
1 01/01/2005 a 02/01/2005
1 02/02/2005 a 03/01/2005
1 03/08/2005 a 04/08/2005
1 06/01/2005 a 07/01/2005
2 01/01/2005 a 02/01/2005
2 02/02/2005 a 03/01/2005
2 03/08/2005 a 04/08/2005
2 06/01/2005 a 07/01/2005
3 01/01/2005 a 02/02/2005
;
run;
also if the patient has one period with no other start date within 7 days (such as in patient 3), the code should assign that period as the start and end date for the output.
Thank you!
data one; informat start_date end_date mmddyy.; input id start_date drug $1. end_Date ; datalines; 1 01/01/2005 a 02/01/2005 1 02/02/2005 a 03/01/2005 1 03/08/2005 a 04/08/2005 1 06/01/2005 a 07/01/2005 2 01/01/2005 a 02/01/2005 2 02/02/2005 a 03/01/2005 2 03/08/2005 a 04/08/2005 2 06/01/2005 a 07/01/2005 3 01/01/2005 a 02/02/2005 ; run; data temp; set one; by drug id; if first.id then group=0; if first.id or start_date-lag(end_date) gt 7 then group+1; if group=1; run; data want; set temp; by drug id; retain start; if first.id then start=start_date; if last.id then output; drop group start_date; format start end_date date9.; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.