Hello,
I have a data set that looks like this:
ID Start_Date End_Date
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
I was wanting to remove the rows of data if the range of the dates fall within the larger range, so output from above will look like this:
ID Start_Date End_Date
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/04/2018 19/06/2018
Any help appreciated 🙂
Thank you!
Could you please suggest the range , as per the code below you can get the difference between the dates as below
if you know the range then you can output the rows as per diff.
data have;
input ID Start_Date:ddmmyy10. End_Date :ddmmyy10. ;
diff=End_Date-Start_Date;
format Start_Date End_Date date9.;
cards;
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
The range of the dates will always between 01/07/2017 and 30/06/2018.
But the start_date and end_date will fluctuate as many as 10 times per ID with overlapping ranges.
I have actually calculated the diff column in the exact same way you have (except I include the last day too). But I trying to find the total number of days for that ID within the start and end date. As you can see, the overlapping dates will produce incorrect results if I simply sum them all now.
Hence I will need to remove the ones that are within the other dates before summing them.
Eg: for ID 1, the total sum is 365 days (01/07/2017 - 30/06/2018) and ID 2, it will be 344 (01/07/2017 - 21/03/2018) + (01/04/2018 - 19/06/2018).
please try the below code
data have;
input ID Start_Date:ddmmyy10. End_Date :ddmmyy10. ;
diff=End_Date-Start_Date;
format Start_Date End_Date date9.;
cards;
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
proc sort data=have;
by id descending diff;
run;
data want;
set have;
by id descending diff;
retain Start_Date2 End_Date2 ;
if first.id then do;
Start_Date2=Start_Date;
End_Date2=End_Date;
end;
if Start_Date2<=Start_Date<=End_Date2 then flag1=1;
if Start_Date2<=End_Date<=End_Date2 then flag1=1;
if first.id or flag1=.;
run;
data have;
input ID Start_Date:ddmmyy10. End_Date :ddmmyy10. ;
format Start_Date End_Date date9.;
cards;
1 01/01/2018 01/01/2018
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
data temp;
set have;
do date= Start_Date to End_Date;
output;
end;
drop Start_Date End_Date;
format date ddmmyy10.;
run;
proc sort data=temp nodupkey;
by id date;
run;
data temp;
set temp;
by id;
if first.id or dif(date) ne 1 then group+1;
run;
proc summary data=temp ;
by id group;
var date;
output out=want(drop=_:) min=start_date max=end_date;
run;
P.S. If you have a big table, try split it into many small tables and run this code on each small table.
Hi @Bounce
If you have overlapping ranges, you can try the following code, which will retrieve the min(start_date) and the max(end_date) if the lag is strictly more than 1.
PS: I have deliberately modified the entry data to show different cases.
My best,
data have;
input ID Start_Date End_Date;
informat Start_Date End_Date ddmmyy10.;
format Start_Date End_Date date9.;
cards;
1 01/01/2019 04/01/2019
1 07/01/2019 12/01/2019
1 10/01/2019 18/01/2019
1 14/01/2019 16/01/2019
1 16/01/2019 22/01/2019
2 01/01/2019 04/01/2019
2 07/01/2019 12/01/2019
2 10/01/2019 18/01/2019
;
run;
proc sort data=have;
by ID Start_Date End_Date;
run;
data have2;
set have;
format _lag date9.;
by ID;
_lag = lag(End_Date) ;
if first.ID then do;
_lag = .;
flag = 0;
end;
if _lag + 1 < Start_Date then flag + 1;
run;
proc sql;
create table want as
select ID,
min(Start_Date) as Start_date format=date9.,
max(End_date) as End_date format=date9.
from have2
group by ID, flag;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.