In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.
First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp :
data Test;
infile cards expandtabs;
input patid $ dtstart :YYMMDD10. dtend : YYMMDD10.;
format dtstart YYMMDD10. dtend YYMMDD10.;
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;
data single_spans (drop=first_: nxt_: label='Single "continuous" enrollment spans');
do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
set test (keep=patid);
by patid;
merge test
test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
if first.patid then first_dtstart=dtstart;
end;
if patid^=lag(patid);
dtstart=first_dtstart;
run;
The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs. That builds a "continuous" span over a sequence of obs. The subsequent
if patid^=lag(patid);
guarantees that only the first such span for each PATID is output.
If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):
data want;
merge single_spans index_dataset ;
by patid;
where dtstart <= intnx('year',index_date,-1,'same') and
dtend <= intnx('year',index_date,+1,'same') and
dtend >= intnx('month',index_date,+1,'same') ;
run;
This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE and DTEND falls between 1 month and 1 year after INDEX_DATE.
... View more