BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi guys,

 

I'd like to mark the patients who had medical insurance when he/she was diagnosed (date_diagnosis) with the disease under the study (KEEP2=YES). All patients have insurance on and off basis, however, some patients had no insurance coverage when the disease was diagnosed (KEEP2=NO). My data has date pairs for the intervals of insurance coverage., ie., start1-end1 and patient lost the insurance and got it back during start2-end2 ...so forth so on.  Desired output is the KEEP_CORRECT.

 

This problem was resolved by @mkeintz and worked on the data I had back then. However, the algorithm makes wrong choice as I

have shown in the image attached. The red stars point out where date_diagnosis actually fell in the intervals of insurance coverage

that I would mark KEEP2=YES but algorithm creates KEEP2=NO.

CORRECT ELIG ALGORITHM.png

 

Could you please help fix the algorithm? or suggest alternative solutions, if any?

 

https://communities.sas.com/t5/SAS-Programming/Select-the-cases-which-included-the-full-calendar-yea...

 

Another wrinkle is the repeated data. The algorithm works well with unique rows. However, my actual data has duplicates. I think I should create another forum once algorithm works for the data with distinct rows.

DATA ELIGIBILITY; 
INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP_CORRECT $;
Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
CARDS;
1	1-Feb-04	28-Feb-06	1-Apr-07	31-May-10	1-Nov-10	31-Aug-13	1-Oct-14	30-Sep-15	1-Mar-16	31-Dec-16	4-Jan-06	.	2006	YES
2	1-Jan-02	31-Dec-04	1-Feb-09	30-Apr-10	1-Jul-10	31-Oct-10	1-Nov-12	31-Dec-16	1-Jul-17	30-Nov-17	23-Feb-09	.	2009	YES
3	1-May-05	31-Jul-05	.	.	.	.	.	.	.	.	1-Mar-05	16633	2005	NO
4	1-Sep-05	30-Sep-05	.	.	.	.	.	.	.	.	29-Jun-05	16701	2005	NO
5	1-Apr-05	30-Apr-07	.	.	.	.	.	.	.	.	3-Jan-05	18629	2005	NO
6	1-Feb-05	31-Oct-05	.	.	.	.	.	.	.	.	3-Jan-05	.	2005	NO
7	1-Apr-04	31-Jul-04	1-Jan-05	30-Jun-05	1-Aug-05	31-Dec-06	.	.	.	.	4-Jan-05	16901	2005	YES
8	1-Sep-05	30-Nov-05	.	.	.	.	.	.	.	.	5-Jan-05	16716	2005	NO
9	1-Jul-05	31-Dec-05	.	.	.	.	.	.	.	.	7-Jan-05	17332	2005	NO
10	1-Jun-04	30-Jun-05	.	.	.	.	.	.	.	.	13-Jan-05	16827	2005	YES
;

data want (drop=_:);
  set eligibility;
  array st {*} start: ;
  array en {*} end:   ;

  _np=n(of st{*});                              /* N of start/end pairs */
  _upper_date=min(mdy(12,31,year),date_death);  /* Ins must cover through this date */

  keep2='YES';                                  /* Assume eligiblity */
  if st{1} > mdy(1,1,year) then keep2='NO ';       /*minimum start is too late ...*/
  else if en{_np} < _upper_date then keep2='NO ';  /*max end precedes upper date*/

  /* Check every gap (i.e. end1:start2,  end2:start3) preceding upper date */
  else if _np>1 then do _i=1 to _np-1 while (en{_i}<_upper_date and keep2='YES'); 
    if year(en{_i}+1)=year then keep2='NO';
    else if year(st{_i+1}-1)=year then keep2='NO';
  end;

run;

PROC PRINT DATA=WANT; RUN; 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Cruise 

When working with date ranges using SQL with data organized in a long format makes things often much simpler. 

Here how this could look like.

DATA wide;
  INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP_CORRECT $;
  Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
  format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
  CARDS;
1 1-Feb-04  28-Feb-06 1-Apr-07  31-May-10 1-Nov-10  31-Aug-13 1-Oct-14  30-Sep-15 1-Mar-16  31-Dec-16 4-Jan-06  . 2006  YES
2 1-Jan-02  31-Dec-04 1-Feb-09  30-Apr-10 1-Jul-10  31-Oct-10 1-Nov-12  31-Dec-16 1-Jul-17  30-Nov-17 23-Feb-09 . 2009  YES
3 1-May-05  31-Jul-05 . . . . . . . . 1-Mar-05  16633 2005  NO
4 1-Sep-05  30-Sep-05 . . . . . . . . 29-Jun-05 16701 2005  NO
5 1-Apr-05  30-Apr-07 . . . . . . . . 3-Jan-05  18629 2005  NO
6 1-Feb-05  31-Oct-05 . . . . . . . . 3-Jan-05  . 2005  NO
7 1-Apr-04  31-Jul-04 1-Jan-05  30-Jun-05 1-Aug-05  31-Dec-06 . . . . 4-Jan-05  16901 2005  YES
8 1-Sep-05  30-Nov-05 . . . . . . . . 5-Jan-05  16716 2005  NO
9 1-Jul-05  31-Dec-05 . . . . . . . . 7-Jan-05  17332 2005  NO
10  1-Jun-04  30-Jun-05 . . . . . . . . 13-Jan-05 16827 2005  YES
;
run;

data long(drop=_: start: end:);
  set wide;
  array _start {*} start:;
  array _end {*} end:;
  attrib insurance_start_dt insurance_end_dt format=date9.;
  do _i=1 to dim(_start);
    if missing(_start[_i]) then continue;
    insurance_start_dt=_start[_i];
    insurance_end_dt=_end[_i];
    output;
  end;
run;

proc sql;
/*  create table want as*/
    select 
      *,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between i.insurance_start_dt and i.insurance_end_dt) >0 then 'YES'
          else 'NO'
        end length=3 
        as KEEP_CORRECT2,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between o.insurance_start_dt and o.insurance_end_dt) >0 then '1'
          else '0'
        end length=1 
        as match_flag
    from long o
  ;
quit;

N.B: A look-up implemented within a SQL Select clause is from a performance perspective not ideal and you will always have to find the right balance between performance and simple, maintainable code.

View solution in original post

7 REPLIES 7
Patrick
Opal | Level 21

@Cruise 

When working with date ranges using SQL with data organized in a long format makes things often much simpler. 

Here how this could look like.

DATA wide;
  INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP_CORRECT $;
  Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
  format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
  CARDS;
1 1-Feb-04  28-Feb-06 1-Apr-07  31-May-10 1-Nov-10  31-Aug-13 1-Oct-14  30-Sep-15 1-Mar-16  31-Dec-16 4-Jan-06  . 2006  YES
2 1-Jan-02  31-Dec-04 1-Feb-09  30-Apr-10 1-Jul-10  31-Oct-10 1-Nov-12  31-Dec-16 1-Jul-17  30-Nov-17 23-Feb-09 . 2009  YES
3 1-May-05  31-Jul-05 . . . . . . . . 1-Mar-05  16633 2005  NO
4 1-Sep-05  30-Sep-05 . . . . . . . . 29-Jun-05 16701 2005  NO
5 1-Apr-05  30-Apr-07 . . . . . . . . 3-Jan-05  18629 2005  NO
6 1-Feb-05  31-Oct-05 . . . . . . . . 3-Jan-05  . 2005  NO
7 1-Apr-04  31-Jul-04 1-Jan-05  30-Jun-05 1-Aug-05  31-Dec-06 . . . . 4-Jan-05  16901 2005  YES
8 1-Sep-05  30-Nov-05 . . . . . . . . 5-Jan-05  16716 2005  NO
9 1-Jul-05  31-Dec-05 . . . . . . . . 7-Jan-05  17332 2005  NO
10  1-Jun-04  30-Jun-05 . . . . . . . . 13-Jan-05 16827 2005  YES
;
run;

data long(drop=_: start: end:);
  set wide;
  array _start {*} start:;
  array _end {*} end:;
  attrib insurance_start_dt insurance_end_dt format=date9.;
  do _i=1 to dim(_start);
    if missing(_start[_i]) then continue;
    insurance_start_dt=_start[_i];
    insurance_end_dt=_end[_i];
    output;
  end;
run;

proc sql;
/*  create table want as*/
    select 
      *,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between i.insurance_start_dt and i.insurance_end_dt) >0 then 'YES'
          else 'NO'
        end length=3 
        as KEEP_CORRECT2,
      case
        when 
          (select count(*) from long i where i.id=o.id and i.DATE_DIAGNOSIS between o.insurance_start_dt and o.insurance_end_dt) >0 then '1'
          else '0'
        end length=1 
        as match_flag
    from long o
  ;
quit;

N.B: A look-up implemented within a SQL Select clause is from a performance perspective not ideal and you will always have to find the right balance between performance and simple, maintainable code.

Cruise
Ammonite | Level 13

@Patrick Thanks a lot. What is match_flag here? do we need it?

Patrick
Opal | Level 21

@Cruise wrote:

@Patrick Thanks a lot. What is match_flag here? do we need it?


I've just added match_flag to mark the row with the matching time period and also to demonstrate how a small change in the SQL logic will yield a different result. If you don't need this information then just remove the logic from the SQL.

Kurt_Bremser
Super User

Using @Patrick 's example data (your step as posted does not work because of tabs, see the log), tranpose and then it's a very simple data step:

proc transpose
  data=wide
  out=start (
    rename=(col1=start)
    where=(start ne .)
  )
;
by id;
var start:;
run;

data start1;
set start;
period = input(substr(_name_,6),best.);
drop _name_;
run;

proc transpose
  data=wide
  out=end (
    rename=(col1=end)
    where=(end ne .)
  )
;
by id;
var end:;
run;

data end1;
set end;
period = input(substr(_name_,4),best.);
drop _name_;
run;

data trans;
merge
  start1
  end1
;
by id period;
run;

data want;
merge
  wide (drop=start: end:)
  trans
;
by id;
if start <= date_diagnosis <= end;
run;
Cruise
Ammonite | Level 13
Thanks a lot Kurt. I'm trying to understand that you said the code didn't work because of tabs?
Kurt_Bremser
Super User

The data eligibility step contains tabs in the datalines, and that's a delimiter that does not work if it is not set in an infile statement.

 

Hint from me: disable the use of tabs in the Enhanced Editor, and also set it to convert tabs to blanks on saving and loading.

Cruise
Ammonite | Level 13

@Kurt_Bremser @Patrick both approaches worked on my analytical data just so perfectly and solutions agreed. Thank you so much for your help.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 853 views
  • 3 likes
  • 3 in conversation