BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SSK_011523
Calcite | Level 5

Hi everyone, 

 

I am using claims data for analysis and was stuck at one part in continuous enrollment. Any help would be appreciated. Thanks!

 

I need patients who are continuously enrolled for 12 months prior to the index date to at least one month and maximum of 12 months after the index date. A gap of 30 days is acceptable. Also, I'm interested in first enrollment period only. 

 

Here's the data I have

Data Test; 

  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;

 format  dtstart YYMMDD10. dtend YYMMDD10.; 

 

cards;

 

001 2017-01-01 2017-01-31

001 2017-02-01 2017-02-31

001 2017-05-01 2017-05-31

002 2018-01-01 2018-01-31

002 2018-02-20 2018-04-31

003 2020-03-25 2020-12-31

003 2021-01-15 2021-08-31

 

Output (intermediate): 

001 2017-01-01 2017-02-31

001 2017-05-01 2017-05-31

002 2018-01-01 2018-04-31

003 2020-03-25 2021-08-31

 

Now, patid 001 has two periods of continuous enrollment but I need only one enrollment period. So the desired output should be - 

001 2017-01-01 2017-02-31

002 2018-01-01 2018-04-31

003 2020-03-25 2021-08-31

 

Following is the Code that I think would work. Not tried yet since data takes long time (~ a day) to process so I want to make sure I do it correctly. I just want to share my thought process

 

Data test1; 

   set test (rename = (dtstart = start dtend = end)); ;

   by patid; 

 

retain dtstart dtend enrolcnt; 

label dtstart = "Enrollment Date Start"

          dtend = "Enrollment Date End"

         Enrolcnt = "Enrollment Period Count";

 

if first.patid then do; 

  dtstart = start; 

 dtend = end; 

enrolcnt = 1; 

end; 

else do; 

 

if dtend + 30 >=start then do; 

if dtend < end then dtend = end; 

end; 

end;

 

if last.patid then output; 

run; 

 

Once I get the continuous enrollment period, I will then select patients with continuos enrollment with enrollment start date  12 months prior to the index date   to at least one month and maximum of 12 months after the index date.  What confuses me is how to code for "at least" part in the previous statement. Should I create a new date variable as (new = index_date + 30)?

 

SAS CODE (definitely incomplete) - 

       

Proc sql; 

  create table test3 as 

 select a.*, b.dtstart, b.dtend 

from data_index_date as a right join test1 as b

on a.patid = b.patid

where intnx ('day', a.index_date, -365) GE b.dtstart and a.index_date LT b.dtend

order by patid; 

quit;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.

 

First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp    :

 

data Test; 
infile cards expandtabs;
  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;
 format  dtstart YYMMDD10. dtend YYMMDD10.; 
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;

data single_spans (drop=first_: nxt_:  label='Single "continuous" enrollment spans');
  do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
    set test (keep=patid);
    by patid;
    merge test
          test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
    if first.patid then first_dtstart=dtstart;
  end;
  if patid^=lag(patid);
  dtstart=first_dtstart;
run;

The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs.  That builds a "continuous" span over a sequence of obs.  The subsequent

  if patid^=lag(patid);

guarantees that only the first such span for each PATID is output.  

 

If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):

 

data want;
  merge single_spans index_dataset ;
  by patid;
  where dtstart <= intnx('year',index_date,-1,'same')   and
        dtend   <= intnx('year',index_date,+1,'same')   and
        dtend   >= intnx('month',index_date,+1,'same') ;
run;

This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE  and DTEND falls between 1 month and 1 year after INDEX_DATE.  

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
ballardw
Super User

I don't see any example of data containing an index date.

 

You mention that you need "continuously enrolled for 12 months prior to the index date". How "prior"? What if the 12 months ends 4 years before the index date?

 

If your data set is large then use a subset to test data such as select 10 or 100 patients to test the code against.

If part of your concern about time is because you are reading from an external data base it may be appropriate to discuss how you are connecting and options, such as reducing the number of variables brought into SAS or other options to improve performance.

 

Why are  you using

intnx ('day', a.index_date, -365) 

instead of

intnx ('year', a.index_date, -1) 

Does your process have a non-standard treatment of leap days when determining "year"?

SSK_011523
Calcite | Level 5

Hi, 

 

Thank you for your reply and advice to reduce processing times. That's helpful. 

 

The data set containing index date is a different dataset derived from medical and pharmacy claims and was irrelevant to this code - so did not share that portion of the code. 

 

To answer your other question - 12 months prior means -

if the index date is (MMDDYY) 1/1/2017 then the patient should be enrolled in insurance from 1/1/2016 to at least 2/1/2017

 

 

Ksharp
Super User
Data Test; 
infile cards expandtabs;
  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;
 format  dtstart YYMMDD10. dtend YYMMDD10.; 
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;

data temp;
 set test;
 do date=dtstart to dtend;
   output;
 end;
 format date yymmdd10.;
 keep patid date;
run;
proc sort data=temp out=temp2 nodupkey;
by patid date;
run;
data temp3;
 set temp2;
 by patid;
 if first.patid or dif(date) ne 1 then group+1;
run;
proc summary data=temp3 ;
by patid group;
var date;
output out=temp4 min=dtstart  max=dtend;
run;
data want;
 set temp4;
 by patid;
 if first.patid;
run;
mkeintz
PROC Star

In the absence of sample data with index dates, in the form of a working data set, the below is not completely tested.

 

First I think the task of generating a dataset of single "continuous" time spans for each PATID, is a relatively straightforward single data step (I use the corrected sample data provided by @Ksharp    :

 

data Test; 
infile cards expandtabs;
  input patid $  dtstart :YYMMDD10. dtend : YYMMDD10.;
 format  dtstart YYMMDD10. dtend YYMMDD10.; 
cards;
001 2017-01-01 2017-01-31
001 2017-02-01 2017-02-28
001 2017-05-01 2017-05-31
002 2018-01-01 2018-01-31
002 2018-02-20 2018-04-30
003 2020-03-25 2020-12-31
003 2021-01-15 2021-08-31
;

data single_spans (drop=first_: nxt_:  label='Single "continuous" enrollment spans');
  do until (last.patid or nxt_start>intnx('month',dtend,1,'same'));
    set test (keep=patid);
    by patid;
    merge test
          test (firstobs=2 keep=dtstart rename=(dtstart=nxt_start));
    if first.patid then first_dtstart=dtstart;
  end;
  if patid^=lag(patid);
  dtstart=first_dtstart;
run;

The "do until" loop reads data until either the PATID is exhausted or the current obs is more than one month prior to the upcoming obs.  That builds a "continuous" span over a sequence of obs.  The subsequent

  if patid^=lag(patid);

guarantees that only the first such span for each PATID is output.  

 

If you have a dataset named INDEX_DATASET, sorted by PATID, and a variable INDEX_DATE, then (this is the untested portion):

 

data want;
  merge single_spans index_dataset ;
  by patid;
  where dtstart <= intnx('year',index_date,-1,'same')   and
        dtend   <= intnx('year',index_date,+1,'same')   and
        dtend   >= intnx('month',index_date,+1,'same') ;
run;

This is intended to select observations in which DTSTART is at least 12 months prior to INDEX_DATE  and DTEND falls between 1 month and 1 year after INDEX_DATE.  

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 360 views
  • 1 like
  • 4 in conversation