Hello, I have several datasets containing various start and end date information and need to remove overlapping dates (see sample code below). For instance, in the below sample, person A had a span of Jan 1 - Nov 30, 2016, and exclude from July 1 - Dec 31, 2016. The resulting start should be Jan 1, 2016 and the end should be June 30, 2016, a day before the exclude start date. Person B had a span of Jan 1 - Dec 2016 and an exclude of Sept 1 - Sept 30, 2016. So they would have two lines for two new start and end dates, the first from Jan 1 - Aug 31, 2016, and another from Oct 1 - Dec 31 2016. Currently, I am doing this by outputting one row per day and using PROC SQL to exclude where ID and date overlap, then I combine consecutive days back into one line. This works, but the problem is that my datasets are much larger--the largest data views I create have 4-6 billion observations and I need to repeat this procedure several times for different datasets. My question is whether there is a simple(ish) way that doesn't involve creating a row for each day of enrollment? data master;
informat id $1. start end mmddyy10.;
format start end mmddyy10.;
input id $ start end;
datalines;
A 01-01-2016 11-30-2016
B 01-01-2016 12-31-2016
;
run;
data exclude;
informat id $1. start end mmddyy10.;
format start end mmddyy10.;
input id $ start end;
datalines;
A 07-01-2016 12-31-2016
B 09-01-2016 09-30-2016
;run;
data want;
informat id $1. start end mmddyy10.;
format start end mmddyy10.;
input id $ start end;
datalines;
A 01-01-2016 06-30-2016
B 01-01-2016 08-31-2016
B 10-01-2016 12-31-2016
;run; Thank you, -Brian
... View more