Solved: Re: Continuous enrollment claims data

SSK_011523 · Posted 05-21-2024 01:05 PM

Hi everyone,

I am using claims data for analysis and was stuck at one part in continuous enrollment. Any help would be appreciated. Thanks!

I need patients who are continuously enrolled for 12 months prior to the index date to at least one month and maximum of 12 months after the index date. A gap of 30 days is acceptable. Also, I'm interested in first enrollment period only.

Here's the data I have

Data Test;

input patid $ dtstart :YYMMDD10. dtend : YYMMDD10.;

format dtstart YYMMDD10. dtend YYMMDD10.;

cards;

001 2017-01-01 2017-01-31

001 2017-02-01 2017-02-31

001 2017-05-01 2017-05-31

002 2018-01-01 2018-01-31

002 2018-02-20 2018-04-31

003 2020-03-25 2020-12-31

003 2021-01-15 2021-08-31

Output (intermediate):

001 2017-01-01 2017-02-31

001 2017-05-01 2017-05-31

002 2018-01-01 2018-04-31

003 2020-03-25 2021-08-31

Now, patid 001 has two periods of continuous enrollment but I need only one enrollment period. So the desired output should be -

001 2017-01-01 2017-02-31

002 2018-01-01 2018-04-31

003 2020-03-25 2021-08-31

Following is the Code that I think would work. Not tried yet since data takes long time (~ a day) to process so I want to make sure I do it correctly. I just want to share my thought process

Data test1;

set test (rename = (dtstart = start dtend = end)); ;

by patid;

retain dtstart dtend enrolcnt;

label dtstart = "Enrollment Date Start"

dtend = "Enrollment Date End"

Enrolcnt = "Enrollment Period Count";

if first.patid then do;

dtstart = start;

dtend = end;

enrolcnt = 1;

end;

else do;

if dtend + 30 >=start then do;

if dtend < end then dtend = end;

end;

if last.patid then output;

run;

Once I get the continuous enrollment period, I will then select patients with continuos enrollment with enrollment start date 12 months prior to the index date to at least one month and maximum of 12 months after the index date. What confuses me is how to code for "at least" part in the previous statement. Should I create a new date variable as (new = index_date + 30)?

SAS CODE (definitely incomplete) -

Proc sql;

create table test3 as

select a.*, b.dtstart, b.dtend

from data_index_date as a right join test1 as b

on a.patid = b.patid

where intnx ('day', a.index_date, -365) GE b.dtstart and a.index_date LT b.dtend

order by patid;

quit;

mkeintz · Posted 05-23-2024 01:13 PM

In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.

First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp :

data Test; 
infile cards expandtabs;
  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;
 format  dtstart YYMMDD10. dtend YYMMDD10.; 
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;

data single_spans (drop=first_: nxt_:  label='Single "continuous" enrollment spans');
  do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
    set test (keep=patid);
    by patid;
    merge test
          test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
    if first.patid then first_dtstart=dtstart;
  end;
  if patid^=lag(patid);
  dtstart=first_dtstart;
run;

The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs. That builds a "continuous" span over a sequence of obs. The subsequent

  if patid^=lag(patid);

guarantees that only the first such span for each PATID is output.

If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):

data want;
  merge single_spans index_dataset ;
  by patid;
  where dtstart <= intnx('year',index_date,-1,'same')   and
        dtend   <= intnx('year',index_date,+1,'same')   and
        dtend   >= intnx('month',index_date,+1,'same') ;
run;

This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE and DTEND falls between 1 month and 1 year after INDEX_DATE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 05-21-2024 01:33 PM

I don't see any example of data containing an index date.

You mention that you need "continuously enrolled for 12 months prior to the index date". How "prior"? What if the 12 months ends 4 years before the index date?

If your data set is large then use a subset to test data such as select 10 or 100 patients to test the code against.

If part of your concern about time is because you are reading from an external data base it may be appropriate to discuss how you are connecting and options, such as reducing the number of variables brought into SAS or other options to improve performance.

Why are you using

intnx ('day', a.index_date, -365)

instead of

intnx ('year', a.index_date, -1)

Does your process have a non-standard treatment of leap days when determining "year"?

SSK_011523 · Posted 05-21-2024 01:46 PM

Hi,

Thank you for your reply and advice to reduce processing times. That's helpful.

The data set containing index date is a different dataset derived from medical and pharmacy claims and was irrelevant to this code - so did not share that portion of the code.

To answer your other question - 12 months prior means -

if the index date is (MMDDYY) 1/1/2017 then the patient should be enrolled in insurance from 1/1/2016 to at least 2/1/2017

Ksharp · Posted 05-21-2024 10:23 PM

Data Test; 
infile cards expandtabs;
  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;
 format  dtstart YYMMDD10. dtend YYMMDD10.; 
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;

data temp;
 set test;
 do date=dtstart to dtend;
   output;
 end;
 format date yymmdd10.;
 keep patid date;
run;
proc sort data=temp out=temp2 nodupkey;
by patid date;
run;
data temp3;
 set temp2;
 by patid;
 if first.patid or dif(date) ne 1 then group+1;
run;
proc summary data=temp3 ;
by patid group;
var date;
output out=temp4 min=dtstart  max=dtend;
run;
data want;
 set temp4;
 by patid;
 if first.patid;
run;

mkeintz · Posted 05-23-2024 01:13 PM

In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.

First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp :

data Test; 
infile cards expandtabs;
  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;
 format  dtstart YYMMDD10. dtend YYMMDD10.; 
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;

data single_spans (drop=first_: nxt_:  label='Single "continuous" enrollment spans');
  do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
    set test (keep=patid);
    by patid;
    merge test
          test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
    if first.patid then first_dtstart=dtstart;
  end;
  if patid^=lag(patid);
  dtstart=first_dtstart;
run;

The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs. That builds a "continuous" span over a sequence of obs. The subsequent

  if patid^=lag(patid);

guarantees that only the first such span for each PATID is output.

If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):

data want;
  merge single_spans index_dataset ;
  by patid;
  where dtstart <= intnx('year',index_date,-1,'same')   and
        dtend   <= intnx('year',index_date,+1,'same')   and
        dtend   >= intnx('month',index_date,+1,'same') ;
run;

This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE and DTEND falls between 1 month and 1 year after INDEX_DATE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Continuous enrollment claims data

Re: Continuous enrollment claims data

Re: Continuous enrollment claims data

Re: Continuous enrollment claims data

Re: Continuous enrollment claims data

Re: Continuous enrollment claims data

Click image to register for webinar

Classroom Training Available!