I'm working on identifying events that happened in date ranges that aren't specifically covered by the data. To do this, I have a list of date ranges, some sequential, some with gaps. What I'm tyring to get is a list of date ranges that span the gaps. For example:
Cat1 | Cat2 | EffDt | TermDt |
A | 1 | 8/8/2016 | 10/16/2016 |
A | 1 | 11/1/2016 | 1/10/2017 |
A | 1 | 1/11/2017 | 4/23/2017 |
A | 1 | 4/24/2017 | 7/19/2017 |
A | 1 | 7/20/2017 | 9/30/2017 |
A | 1 | 10/1/2017 | 3/31/2018 |
A | 2 | 10/1/2017 | 4/29/2018 |
A | 2 | 10/1/2017 | 7/1/2018 |
In this case, A1 is covered from 8/8/2016 through 10/16/2016, followed by a gap from 10/17/2016 through 10/31/2016, and coevered again from 11/1/2016 through 3/31/2018. What I have been trying to get out of this is the 10/17/2016 amd 10/31/2016 values. In the event that there were multiple gaps, I'd want to identify each of them. I've been trying a few different loop permutations, but haven't been able to get anything to work quite right.
Any help would be greatly appreciated.
Thanks!
data have;
input Cat1 $ Cat2 $ (EffDt TermDt)(:mmddyy10.);
format EffDt TermDt mmddyy10.;
datalines;
A 1 8/8/2016 10/16/2016
A 1 11/1/2016 1/10/2017
A 1 1/11/2017 4/23/2017
A 1 4/24/2017 7/19/2017
A 1 7/20/2017 9/30/2017
A 1 10/1/2017 3/31/2018
A 2 10/1/2017 4/29/2018
A 2 10/1/2017 7/1/2018
;
data want;
set have;
by cat1 cat2;
lag=lag(TermDt);
d=EffDt-lag;
if d>1 and not first.cat2 then do;
TermDt=EffDt-1;EffDt=lag+1;output;
end;
drop d lag;
run;
Assuming data is ordered.
HI and welcome to the SAS Communities 🙂
It is not clear to me if you want both Cat1 and Cat2 to constitute a group here. However, i think this gives you what you want. It inserts a new line with the 'date gap' whenever a date gap is detected. Let me know if it works for you.
data have;
input Cat1 $ Cat2 $ (EffDt TermDt)(:mmddyy10.);
format EffDt TermDt mmddyy10.;
datalines;
A 1 8/8/2016 10/16/2016
A 1 11/1/2016 1/10/2017
A 1 1/11/2017 4/23/2017
A 1 4/24/2017 7/19/2017
A 1 7/20/2017 9/30/2017
A 1 10/1/2017 3/31/2018
A 2 10/1/2017 4/29/2018
A 2 10/1/2017 7/1/2018
;
data want(drop= lagTermDt _:);
set have;
by Cat2;
lagTermDt=lag1(TermDt);
if first.Cat2 then lagTermDt=.;
if intck('day', lagTermDt, EffDt)>1 & lagTermDt ne . then do;
_EffDt=EffDt; _TermDt=TermDt;
EffDt=lagTermDt+1;
TermDt=_EffDt-1;
output;
EffDt=_EffDt; TermDt=_TermDt;
output;return;
end;
output;
run;
Please post your code using a complete data step and datalines statement so we don't have to work so hard to help you out.
As to your problem, and only considering the pattern in the data you posted:
Use a surrogate key (record pointer) and SQL's Cartesian product to "look ahead" to the next record.
data have;
length cat1 $1 cat2 8 start end 8;
format start end date9.;
informat start end mmddyy10.;
input cat1 cat2 start end;
datalines;
A 1 8/8/2016 10/16/2016
A 1 11/1/2016 1/10/2017
A 1 1/11/2017 4/23/2017
A 1 4/24/2017 7/19/2017
A 1 7/20/2017 9/30/2017
A 1 10/1/2017 3/31/2018
A 2 10/1/2017 4/29/2018
A 2 10/1/2017 7/1/2018
;
run;
* create surrogate key ;
data sk / view=sk;
sk+1;
set have;
run;
proc sql;
* for illustration/debugging ;
select
a.*
,b.*
from
sk a
join
sk b
on
a.cat1=b.cat1
and
a.cat2=b.cat2
and
a.sk=b.sk-1
;
create table want as
select
a.cat1
,a.cat2
,a.start as old_start
,a.end as old_end
,b.start as new_start
,b.end as new_end
,intck('day',old_end,new_start) as gap
from
sk a
join
sk b
on
a.cat1=b.cat1
and
a.cat2=b.cat2
and
a.sk=b.sk-1
where
intck('day',old_end,new_start) > 1
;
quit;
You can tidy this up a bit, dropping old_start and new_end if you wish.
This doesn't cover say overlapping dates, where the new start date < old end date. Again, see bold type above.
Is there a typo in EffDt in your last dataline?
If you want to find events in another dataset that are not covered by date ranges, you can do that directly, something like
Proc sql;
create table uncovered as select * from events
where not exists(select * from date_ranges
where cat1=events.cat1
and cat2=events.cat2
and events.date>=EffDt
and events.date<=TermDt);
I used the two last conditions instead of just using a BETWEEN condition because SAS SQL is peculiar in its interpretation of BETWEEN - "X between A and B" in SAS SQL can be true also in cases where A>B, such as your last observation, then one which @ScottBass suggested could be a typo.
data have;
input Cat1 $ Cat2 $ (EffDt TermDt)(:mmddyy10.);
format EffDt TermDt mmddyy10.;
datalines;
A 1 8/8/2016 10/16/2016
A 1 11/1/2016 1/10/2017
A 1 1/11/2017 4/23/2017
A 1 4/24/2017 7/19/2017
A 1 7/20/2017 9/30/2017
A 1 10/1/2017 3/31/2018
A 2 10/1/2017 4/29/2018
A 2 10/1/2017 7/1/2018
;
data want;
set have;
by cat1 cat2;
lag=lag(TermDt);
d=EffDt-lag;
if d>1 and not first.cat2 then do;
TermDt=EffDt-1;EffDt=lag+1;output;
end;
drop d lag;
run;
Assuming data is ordered.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.