Solved: Check if date points fall in the time intervals for eligibility assess...

Cruise · Posted 04-05-2019 07:20 PM

Hi guys,

I'd like to mark the patients who had medical insurance when he/she was diagnosed (date_diagnosis) with the disease under the study (KEEP2=YES). All patients have insurance on and off basis, however, some patients had no insurance coverage when the disease was diagnosed (KEEP2=NO). My data has date pairs for the intervals of insurance coverage., ie., start1-end1 and patient lost the insurance and got it back during start2-end2 ...so forth so on. Desired output is the KEEP_CORRECT.

This problem was resolved by @mkeintz and worked on the data I had back then. However, the algorithm makes wrong choice as I

have shown in the image attached. The red stars point out where date_diagnosis actually fell in the intervals of insurance coverage

that I would mark KEEP2=YES but algorithm creates KEEP2=NO.

Could you please help fix the algorithm? or suggest alternative solutions, if any?

https://communities.sas.com/t5/SAS-Programming/Select-the-cases-which-included-the-full-calendar-yea...

Another wrinkle is the repeated data. The algorithm works well with unique rows. However, my actual data has duplicates. I think I should create another forum once algorithm works for the data with distinct rows.

DATA ELIGIBILITY; 
INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP_CORRECT $;
Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
CARDS;
1	1-Feb-04	28-Feb-06	1-Apr-07	31-May-10	1-Nov-10	31-Aug-13	1-Oct-14	30-Sep-15	1-Mar-16	31-Dec-16	4-Jan-06	.	2006	YES
2	1-Jan-02	31-Dec-04	1-Feb-09	30-Apr-10	1-Jul-10	31-Oct-10	1-Nov-12	31-Dec-16	1-Jul-17	30-Nov-17	23-Feb-09	.	2009	YES
3	1-May-05	31-Jul-05	.	.	.	.	.	.	.	.	1-Mar-05	16633	2005	NO
4	1-Sep-05	30-Sep-05	.	.	.	.	.	.	.	.	29-Jun-05	16701	2005	NO
5	1-Apr-05	30-Apr-07	.	.	.	.	.	.	.	.	3-Jan-05	18629	2005	NO
6	1-Feb-05	31-Oct-05	.	.	.	.	.	.	.	.	3-Jan-05	.	2005	NO
7	1-Apr-04	31-Jul-04	1-Jan-05	30-Jun-05	1-Aug-05	31-Dec-06	.	.	.	.	4-Jan-05	16901	2005	YES
8	1-Sep-05	30-Nov-05	.	.	.	.	.	.	.	.	5-Jan-05	16716	2005	NO
9	1-Jul-05	31-Dec-05	.	.	.	.	.	.	.	.	7-Jan-05	17332	2005	NO
10	1-Jun-04	30-Jun-05	.	.	.	.	.	.	.	.	13-Jan-05	16827	2005	YES
;

data want (drop=_:);
  set eligibility;
  array st {*} start: ;
  array en {*} end:   ;

  _np=n(of st{*});                              /* N of start/end pairs */
  _upper_date=min(mdy(12,31,year),date_death);  /* Ins must cover through this date */

  keep2='YES';                                  /* Assume eligiblity */
  if st{1} > mdy(1,1,year) then keep2='NO ';       /*minimum start is too late ...*/
  else if en{_np} < _upper_date then keep2='NO ';  /*max end precedes upper date*/

  /* Check every gap (i.e. end1:start2,  end2:start3) preceding upper date */
  else if _np>1 then do _i=1 to _np-1 while (en{_i}<_upper_date and keep2='YES'); 
    if year(en{_i}+1)=year then keep2='NO';
    else if year(st{_i+1}-1)=year then keep2='NO';
  end;

run;

PROC PRINT DATA=WANT; RUN;

Patrick · Posted 04-05-2019 11:31 PM

@Cruise

When working with date ranges using SQL with data organized in a long format makes things often much simpler.

Here how this could look like.

DATA wide;
  INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP_CORRECT $;
  Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
  format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
  CARDS;
1 1-Feb-04  28-Feb-06 1-Apr-07  31-May-10 1-Nov-10  31-Aug-13 1-Oct-14  30-Sep-15 1-Mar-16  31-Dec-16 4-Jan-06  . 2006  YES
2 1-Jan-02  31-Dec-04 1-Feb-09  30-Apr-10 1-Jul-10  31-Oct-10 1-Nov-12  31-Dec-16 1-Jul-17  30-Nov-17 23-Feb-09 . 2009  YES
3 1-May-05  31-Jul-05 . . . . . . . . 1-Mar-05  16633 2005  NO
4 1-Sep-05  30-Sep-05 . . . . . . . . 29-Jun-05 16701 2005  NO
5 1-Apr-05  30-Apr-07 . . . . . . . . 3-Jan-05  18629 2005  NO
6 1-Feb-05  31-Oct-05 . . . . . . . . 3-Jan-05  . 2005  NO
7 1-Apr-04  31-Jul-04 1-Jan-05  30-Jun-05 1-Aug-05  31-Dec-06 . . . . 4-Jan-05  16901 2005  YES
8 1-Sep-05  30-Nov-05 . . . . . . . . 5-Jan-05  16716 2005  NO
9 1-Jul-05  31-Dec-05 . . . . . . . . 7-Jan-05  17332 2005  NO
10  1-Jun-04  30-Jun-05 . . . . . . . . 13-Jan-05 16827 2005  YES
;
run;

data long(drop=_: start: end:);
  set wide;
  array _start {*} start:;
  array _end {*} end:;
  attrib insurance_start_dt insurance_end_dt format=date9.;
  do _i=1 to dim(_start);
    if missing(_start[_i]) then continue;
    insurance_start_dt=_start[_i];
    insurance_end_dt=_end[_i];
    output;
  end;
run;

proc sql;
/*  create table want as*/
    select 
      *,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between i.insurance_start_dt and i.insurance_end_dt) >0 then 'YES'
          else 'NO'
        end length=3 
        as KEEP_CORRECT2,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between o.insurance_start_dt and o.insurance_end_dt) >0 then '1'
          else '0'
        end length=1 
        as match_flag
    from long o
  ;
quit;

N.B: A look-up implemented within a SQL Select clause is from a performance perspective not ideal and you will always have to find the right balance between performance and simple, maintainable code.

View solution in original post

Patrick · Posted 04-05-2019 11:31 PM

@Cruise

When working with date ranges using SQL with data organized in a long format makes things often much simpler.

Here how this could look like.

DATA wide;
  INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP_CORRECT $;
  Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
  format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
  CARDS;
1 1-Feb-04  28-Feb-06 1-Apr-07  31-May-10 1-Nov-10  31-Aug-13 1-Oct-14  30-Sep-15 1-Mar-16  31-Dec-16 4-Jan-06  . 2006  YES
2 1-Jan-02  31-Dec-04 1-Feb-09  30-Apr-10 1-Jul-10  31-Oct-10 1-Nov-12  31-Dec-16 1-Jul-17  30-Nov-17 23-Feb-09 . 2009  YES
3 1-May-05  31-Jul-05 . . . . . . . . 1-Mar-05  16633 2005  NO
4 1-Sep-05  30-Sep-05 . . . . . . . . 29-Jun-05 16701 2005  NO
5 1-Apr-05  30-Apr-07 . . . . . . . . 3-Jan-05  18629 2005  NO
6 1-Feb-05  31-Oct-05 . . . . . . . . 3-Jan-05  . 2005  NO
7 1-Apr-04  31-Jul-04 1-Jan-05  30-Jun-05 1-Aug-05  31-Dec-06 . . . . 4-Jan-05  16901 2005  YES
8 1-Sep-05  30-Nov-05 . . . . . . . . 5-Jan-05  16716 2005  NO
9 1-Jul-05  31-Dec-05 . . . . . . . . 7-Jan-05  17332 2005  NO
10  1-Jun-04  30-Jun-05 . . . . . . . . 13-Jan-05 16827 2005  YES
;
run;

data long(drop=_: start: end:);
  set wide;
  array _start {*} start:;
  array _end {*} end:;
  attrib insurance_start_dt insurance_end_dt format=date9.;
  do _i=1 to dim(_start);
    if missing(_start[_i]) then continue;
    insurance_start_dt=_start[_i];
    insurance_end_dt=_end[_i];
    output;
  end;
run;

proc sql;
/*  create table want as*/
    select 
      *,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between i.insurance_start_dt and i.insurance_end_dt) >0 then 'YES'
          else 'NO'
        end length=3 
        as KEEP_CORRECT2,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between o.insurance_start_dt and o.insurance_end_dt) >0 then '1'
          else '0'
        end length=1 
        as match_flag
    from long o
  ;
quit;

N.B: A look-up implemented within a SQL Select clause is from a performance perspective not ideal and you will always have to find the right balance between performance and simple, maintainable code.

Cruise · Posted 04-06-2019 11:40 AM

@Patrick Thanks a lot. What is match_flag here? do we need it?

Patrick · Posted 04-06-2019 03:36 PM

@Cruise wrote:

@Patrick Thanks a lot. What is match_flag here? do we need it?

I've just added match_flag to mark the row with the matching time period and also to demonstrate how a small change in the SQL logic will yield a different result. If you don't need this information then just remove the logic from the SQL.

Kurt_Bremser · Posted 04-06-2019 10:45 AM

Using @Patrick 's example data (your step as posted does not work because of tabs, see the log), tranpose and then it's a very simple data step:

proc transpose
  data=wide
  out=start (
    rename=(col1=start)
    where=(start ne .)
  )
;
by id;
var start:;
run;

data start1;
set start;
period = input(substr(_name_,6),best.);
drop _name_;
run;

proc transpose
  data=wide
  out=end (
    rename=(col1=end)
    where=(end ne .)
  )
;
by id;
var end:;
run;

data end1;
set end;
period = input(substr(_name_,4),best.);
drop _name_;
run;

data trans;
merge
  start1
  end1
;
by id period;
run;

data want;
merge
  wide (drop=start: end:)
  trans
;
by id;
if start <= date_diagnosis <= end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Cruise · Posted 04-06-2019 10:53 AM

Thanks a lot Kurt. I'm trying to understand that you said the code didn't work because of tabs?

Kurt_Bremser · Posted 04-06-2019 10:59 AM

The data eligibility step contains tabs in the datalines, and that's a delimiter that does not work if it is not set in an infile statement.

Hint from me: disable the use of tabs in the Enhanced Editor, and also set it to convert tabs to blanks on saving and loading.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Cruise · Posted 04-06-2019 05:26 PM

@Kurt_Bremser @Patrick both approaches worked on my analytical data just so perfectly and solutions agreed. Thank you so much for your help.

Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Re: Check if date points fall in the time intervals for eligibility assessment

Registration is open

SAS Training: Just a Click Away