@tennis1:
The logically simplest way is to "paintbrush" the dates into an array or a hash table; then you need only simple by-group processing. For example, using a hash table as the medium of choice:
data have ;
input id drug $ (start_dt end_dt) (:mmddyy8.) ;
cards ;
1 A 10/14/19 11/14/19
1 B 11/06/19 12/06/19
1 A 12/09/19 01/09/20
2 A 10/01/19 11/01/19
2 B 10/25/19 11/25/19
2 B 12/01/19 12/31/19
3 A 10/06/19 11/06/19
3 B 10/01/19 11/01/19
4 A 11/01/19 11/30/19
4 B 11/03/19 12/03/19
4 B 12/15/19 01/15/20
5 B 10/05/19 11/05/19
5 A 10/01/19 11/01/19
;
data _null_ ;
dcl hash h (ordered:"a") ;
h.definekey ("id", "date") ;
h.definedata ("id", "date", "olap", "olap_ct") ;
h.definedone () ;
do until (z) ;
set have end = z ;
do date = start_dt to end_dt ;
if h.find() ne 0 then olap_ct = 1 ;
else olap_ct + 1 ;
olap = olap_ct > 1 ;
h.replace() ;
end ;
end ;
h.output (dataset:"hash") ;
run ;
data want (keep = id start_dt end_dt olap_ct) ;
do until (last.olap) ;
set hash ;
by id olap notsorted ;
if first.olap then start_dt = date ;
end ;
if olap ;
end_dt = date ;
format start_dt end_dt yymmdd10. ;
run ;
One caveat of the above is that it can create quite a sizable hash table, particularly if you process claims, which usually are big files. To avoid overtaxing memory in such a case, we can make use of the existing sorted order by ID, so that the largest chunk of data loaded into the hash is dictated by the largest BY group.
At the same time, rather than to concatenate the hashes from all BY groups into file HASH for subsequent BY processing, we can replace it with artificial BY processing on the fly using control-break logic while enumerating the hash table with an iterator. A dummy item can be added to each partial hash to simplify the control-break code (as done below). This way, everything is done in a single step, and memory usage is kept in check. E.g.:
data want (keep = id start_dt end_dt olap_ct) ;
if _n_ = 1 then do ;
dcl hash h (ordered:"a") ;
h.definekey ("id", "date") ;
h.definedata ("id", "date", "olap", "olap_ct") ;
h.definedone () ;
dcl hiter hi ("h") ;
end ;
do until (last.id) ;
set have ;
by id ;
do date = start_dt to end_dt ;
if h.find() ne 0 then olap_ct = 1 ;
else olap_ct + 1 ;
olap = olap_ct > 1 ;
h.replace() ;
end ;
end ;
hi.last() ;
date + 1 ;
olap = 0 ;
h.add() ;
format start_dt end_dt yymmdd10. ;
do _n_ = hi.first() by 0 while (_n_ = 0) ;
if olap and ^ _olap then start_dt = date ;
if ^ olap and _olap then do ;
end_dt = _date ;
olap_ct = _olap_ct ;
output ;
end ;
_olap = olap ;
_olap_ct = olap_ct ;
_date = date ;
_n_ = hi.next() ;
end ;
h.clear() ;
run ;
Note that in the output file WANT start_dt and end_dt represent the endpoints of the date intervals where the drugs overlap. OLAP_CT gives the number of drugs overlapping within the interval. So, if you had more than 2 drugs per member, and within some interval 3 or more drugs overlapped, you'd see the corresponding number. Of course, the OLAP_CT=1 case (no overlap) is filtered out.
Kind regards
Paul D.
... View more