Hello,
I have a data set that looks like this:
ID Start_Date End_Date
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
I was wanting to remove the rows of data if the range of the dates fall within the larger range, so output from above will look like this:
ID Start_Date End_Date
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/04/2018 19/06/2018
Any help appreciated 🙂
Thank you!
Could you please suggest the range , as per the code below you can get the difference between the dates as below
if you know the range then you can output the rows as per diff.
data have;
input ID Start_Date:ddmmyy10. End_Date :ddmmyy10. ;
diff=End_Date-Start_Date;
format Start_Date End_Date date9.;
cards;
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
The range of the dates will always between 01/07/2017 and 30/06/2018.
But the start_date and end_date will fluctuate as many as 10 times per ID with overlapping ranges.
I have actually calculated the diff column in the exact same way you have (except I include the last day too). But I trying to find the total number of days for that ID within the start and end date. As you can see, the overlapping dates will produce incorrect results if I simply sum them all now.
Hence I will need to remove the ones that are within the other dates before summing them.
Eg: for ID 1, the total sum is 365 days (01/07/2017 - 30/06/2018) and ID 2, it will be 344 (01/07/2017 - 21/03/2018) + (01/04/2018 - 19/06/2018).
please try the below code
data have;
input ID Start_Date:ddmmyy10. End_Date :ddmmyy10. ;
diff=End_Date-Start_Date;
format Start_Date End_Date date9.;
cards;
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
proc sort data=have;
by id descending diff;
run;
data want;
set have;
by id descending diff;
retain Start_Date2 End_Date2 ;
if first.id then do;
Start_Date2=Start_Date;
End_Date2=End_Date;
end;
if Start_Date2<=Start_Date<=End_Date2 then flag1=1;
if Start_Date2<=End_Date<=End_Date2 then flag1=1;
if first.id or flag1=.;
run;
data have;
input ID Start_Date:ddmmyy10. End_Date :ddmmyy10. ;
format Start_Date End_Date date9.;
cards;
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
data temp;
set have;
do date= Start_Date to End_Date;
output;
end;
drop Start_Date End_Date;
format date ddmmyy10.;
run;
proc sort data=temp nodupkey;
by id date;
run;
data temp;
set temp;
by id;
if first.id or dif(date) ne 1 then group+1;
run;
proc summary data=temp ;
by id group;
var date;
output out=want(drop=_:) min=start_date max=end_date;
run;
P.S. If you have a big table, try split it into many small tables and run this code on each small table.
Hi @Bounce
If you have overlapping ranges, you can try the following code, which will retrieve the min(start_date) and the max(end_date) if the lag is strictly more than 1.
PS: I have deliberately modified the entry data to show different cases.
My best,
data have;
input ID Start_Date End_Date;
informat Start_Date End_Date ddmmyy10.;
format Start_Date End_Date date9.;
cards;
1 01/01/2019 04/01/2019
1 07/01/2019 12/01/2019
1 10/01/2019 18/01/2019
1 14/01/2019 16/01/2019
1 16/01/2019 22/01/2019
2 01/01/2019 04/01/2019
2 07/01/2019 12/01/2019
2 10/01/2019 18/01/2019
;
run;
proc sort data=have;
by ID Start_Date End_Date;
run;
data have2;
set have;
format _lag date9.;
by ID;
_lag = lag(End_Date) ;
if first.ID then do;
_lag = .;
flag = 0;
end;
if _lag + 1 < Start_Date then flag + 1;
run;
proc sql;
create table want as
select ID,
min(Start_Date) as Start_date format=date9.,
max(End_date) as End_date format=date9.
from have2
group by ID, flag;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.