BookmarkSubscribeRSS Feed
Bounce
Fluorite | Level 6

Hello,

 

I have a data set that looks like this:

ID    Start_Date    End_Date    

1      01/01/2018   01/01/2018 

1      01/07/2017   30/06/2018

2      01/07/2017   21/03/2018

2      01/09/2017   26/10/2017

2      03/10/2017   31/12/2017

2      01/04/2018   19/06/2018

 

I was wanting to remove the rows of data if the range of the dates fall within the larger range, so output from above will look like this:

ID    Start_Date    End_Date     

1      01/07/2017   30/06/2018

2      01/07/2017   21/03/2018

2      01/04/2018   19/06/2018

 

Any help appreciated 🙂

Thank you!

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

Could you please suggest the range , as per the code below you can get the difference between the dates as below

if you know the range then you can output the rows as per diff.

 

data have;     
input ID    Start_Date:ddmmyy10.    End_Date :ddmmyy10. ;
diff=End_Date-Start_Date;
format Start_Date End_Date date9.;
cards;   
1 01/01/2018 01/01/2018 
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;

image.png

Thanks,
Jag
Bounce
Fluorite | Level 6

The range of the dates will always between 01/07/2017 and 30/06/2018.

 

But the start_date and end_date will fluctuate as many as 10 times per ID with overlapping ranges.

 

I have actually calculated the diff column in the exact same way you have (except I include the last day too). But I trying to find the total number of days for that ID within the start and end date. As you can see, the overlapping dates will produce incorrect results if I simply sum them all now.

 

Hence I will need to remove the ones that are within the other dates before summing them.

 

Eg: for ID 1, the total sum is 365 days (01/07/2017 - 30/06/2018) and ID 2, it will be 344 (01/07/2017 - 21/03/2018) + (01/04/2018 - 19/06/2018).

Jagadishkatam
Amethyst | Level 16

please try the below code

 

data have;     
input ID    Start_Date:ddmmyy10.    End_Date :ddmmyy10. ;
diff=End_Date-Start_Date;
format Start_Date End_Date date9.;
cards;   
1 01/01/2018 01/01/2018 
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;
proc sort data=have;
by id descending diff;
run;

data want;
set have;
by id descending diff;
retain Start_Date2 End_Date2 ;
if first.id then do;
Start_Date2=Start_Date;
End_Date2=End_Date;
end;
if Start_Date2<=Start_Date<=End_Date2 then flag1=1;
if Start_Date2<=End_Date<=End_Date2 then flag1=1;
if first.id or flag1=.;
run;
Thanks,
Jag
Ksharp
Super User
data have;     
input ID    Start_Date:ddmmyy10.    End_Date :ddmmyy10. ;
format Start_Date End_Date date9.;
cards;   
1 01/01/2018 01/01/2018 
1 01/07/2017 30/06/2018
2 01/07/2017 21/03/2018
2 01/09/2017 26/10/2017
2 03/10/2017 31/12/2017
2 01/04/2018 19/06/2018
;

data temp;
 set have;
 do date= Start_Date to End_Date;
  output;
 end;
 drop  Start_Date  End_Date;
 format date ddmmyy10.;
 run;
 proc sort data=temp nodupkey;
 by id date;
 run;
 data temp;
  set temp;
  by id;
  if first.id or dif(date) ne 1 then group+1;
run;
proc summary data=temp ;
by id group;
var date;
output out=want(drop=_:) min=start_date max=end_date;
run;

P.S. If you have a big table, try split it into many small tables and run this code on each small table.

ed_sas_member
Meteorite | Level 14

Hi @Bounce 

 

If you have overlapping ranges, you can try the following code, which will retrieve the min(start_date) and the max(end_date) if the lag is strictly more than 1.

PS: I have deliberately modified the entry data to show different cases.

 

My best,

 

data have;
	input ID Start_Date End_Date;
	informat Start_Date End_Date ddmmyy10.;
	format Start_Date End_Date date9.;
	cards;
1 01/01/2019 04/01/2019 
1 07/01/2019 12/01/2019
1 10/01/2019 18/01/2019
1 14/01/2019 16/01/2019
1 16/01/2019 22/01/2019
2 01/01/2019 04/01/2019
2 07/01/2019 12/01/2019
2 10/01/2019 18/01/2019 ; run; proc sort data=have; by ID Start_Date End_Date; run; data have2; set have; format _lag date9.; by ID; _lag = lag(End_Date) ; if first.ID then do; _lag = .; flag = 0; end; if _lag + 1 < Start_Date then flag + 1; run; proc sql; create table want as select ID, min(Start_Date) as Start_date format=date9., max(End_date) as End_date format=date9. from have2 group by ID, flag; quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 472 views
  • 0 likes
  • 4 in conversation