Hi everyone,
I am using claims data for analysis and was stuck at one part in continuous enrollment. Any help would be appreciated. Thanks!
I need patients who are continuously enrolled for 12 months prior to the index date to at least one month and maximum of 12 months after the index date. A gap of 30 days is acceptable. Also, I'm interested in first enrollment period only.
Here's the data I have
Data Test;
input patid $ dtstart :YYMMDD10. dtend : YYMMDD10.;
format dtstart YYMMDD10. dtend YYMMDD10.;
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-31
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-31
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
Output (intermediate):
001 2017-01-01 2017-02-31
001 2017-05-01 2017-05-31
002 2018-01-01 2018-04-31
003 2020-03-25 2021-08-31
Now, patid 001 has two periods of continuous enrollment but I need only one enrollment period. So the desired output should be -
001 2017-01-01 2017-02-31
002 2018-01-01 2018-04-31
003 2020-03-25 2021-08-31
Following is the Code that I think would work. Not tried yet since data takes long time (~ a day) to process so I want to make sure I do it correctly. I just want to share my thought process
Data test1;
set test (rename = (dtstart = start dtend = end)); ;
by patid;
retain dtstart dtend enrolcnt;
label dtstart = "Enrollment Date Start"
dtend = "Enrollment Date End"
Enrolcnt = "Enrollment Period Count";
if first.patid then do;
dtstart = start;
dtend = end;
enrolcnt = 1;
end;
else do;
if dtend + 30 >=start then do;
if dtend < end then dtend = end;
end;
end;
if last.patid then output;
run;
Once I get the continuous enrollment period, I will then select patients with continuos enrollment with enrollment start date 12 months prior to the index date to at least one month and maximum of 12 months after the index date. What confuses me is how to code for "at least" part in the previous statement. Should I create a new date variable as (new = index_date + 30)?
SAS CODE (definitely incomplete) -
Proc sql;
create table test3 as
select a.*, b.dtstart, b.dtend
from data_index_date as a right join test1 as b
on a.patid = b.patid
where intnx ('day', a.index_date, -365) GE b.dtstart and a.index_date LT b.dtend
order by patid;
quit;
In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.
First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp :
data Test;
infile cards expandtabs;
input patid $ dtstart :YYMMDD10. dtend : YYMMDD10.;
format dtstart YYMMDD10. dtend YYMMDD10.;
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;
data single_spans (drop=first_: nxt_: label='Single "continuous" enrollment spans');
do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
set test (keep=patid);
by patid;
merge test
test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
if first.patid then first_dtstart=dtstart;
end;
if patid^=lag(patid);
dtstart=first_dtstart;
run;
The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs. That builds a "continuous" span over a sequence of obs. The subsequent
if patid^=lag(patid);
guarantees that only the first such span for each PATID is output.
If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):
data want;
merge single_spans index_dataset ;
by patid;
where dtstart <= intnx('year',index_date,-1,'same') and
dtend <= intnx('year',index_date,+1,'same') and
dtend >= intnx('month',index_date,+1,'same') ;
run;
This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE and DTEND falls between 1 month and 1 year after INDEX_DATE.
I don't see any example of data containing an index date.
You mention that you need "continuously enrolled for 12 months prior to the index date". How "prior"? What if the 12 months ends 4 years before the index date?
If your data set is large then use a subset to test data such as select 10 or 100 patients to test the code against.
If part of your concern about time is because you are reading from an external data base it may be appropriate to discuss how you are connecting and options, such as reducing the number of variables brought into SAS or other options to improve performance.
Why are you using
intnx ('day', a.index_date, -365)
instead of
intnx ('year', a.index_date, -1)
Does your process have a non-standard treatment of leap days when determining "year"?
Hi,
Thank you for your reply and advice to reduce processing times. That's helpful.
The data set containing index date is a different dataset derived from medical and pharmacy claims and was irrelevant to this code - so did not share that portion of the code.
To answer your other question - 12 months prior means -
if the index date is (MMDDYY) 1/1/2017 then the patient should be enrolled in insurance from 1/1/2016 to at least 2/1/2017
Data Test;
infile cards expandtabs;
input patid $ dtstart :YYMMDD10. dtend : YYMMDD10.;
format dtstart YYMMDD10. dtend YYMMDD10.;
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;
data temp;
set test;
do date=dtstart to dtend;
output;
end;
format date yymmdd10.;
keep patid date;
run;
proc sort data=temp out=temp2 nodupkey;
by patid date;
run;
data temp3;
set temp2;
by patid;
if first.patid or dif(date) ne 1 then group+1;
run;
proc summary data=temp3 ;
by patid group;
var date;
output out=temp4 min=dtstart max=dtend;
run;
data want;
set temp4;
by patid;
if first.patid;
run;
In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.
First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp :
data Test;
infile cards expandtabs;
input patid $ dtstart :YYMMDD10. dtend : YYMMDD10.;
format dtstart YYMMDD10. dtend YYMMDD10.;
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;
data single_spans (drop=first_: nxt_: label='Single "continuous" enrollment spans');
do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
set test (keep=patid);
by patid;
merge test
test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
if first.patid then first_dtstart=dtstart;
end;
if patid^=lag(patid);
dtstart=first_dtstart;
run;
The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs. That builds a "continuous" span over a sequence of obs. The subsequent
if patid^=lag(patid);
guarantees that only the first such span for each PATID is output.
If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):
data want;
merge single_spans index_dataset ;
by patid;
where dtstart <= intnx('year',index_date,-1,'same') and
dtend <= intnx('year',index_date,+1,'same') and
dtend >= intnx('month',index_date,+1,'same') ;
run;
This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE and DTEND falls between 1 month and 1 year after INDEX_DATE.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.