BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi folks,

 

I'm trying to figure out: How many days in the year of diagnosis was covered by the insurance? 

 

I'd like to create coverage and N-days variables as shown in the image below. n_days is not precise accounting for the varying 28,30,31 end points across different months. However, the idea is to count the number of days covered by the insurance based on the eligibility data identified by the startN-endN interval pairs. Let me explain a little bit. 

 

Full coverage for ID1 because diagnosis year (dx_year) is contained in the start1-end1 interval. 

Partial coverage for ID2  because month of November is not covered in 2016. 

Partial coverage for ID5 because start1-end1 covers only Jan thru Aug. Apparently missing coverage in Sep, Oct, Nov until it starts at startN which covers the month of Dec of 2015. In total then the number of days covered is 271. Again, I manually calculated N-days. Please excuse me for the rough values. My actual data set has 50K patients with time intervals each. There might be different scenarios that I haven't thought through yet. 

 

WANTED_DATE_CHECK.png

 

 

DATA COVERAGE; 
INPUT ID start1	end1 start2	end2	start3	end3	start4	end4	start5	end5 dx_year coverage $ N_days;
Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 anydtdte.;
format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 date9.;
CARDS;
1	1-Jan-02	30-Apr-14	1-Jun-14	31-Jul-15	.	.	.	.	.	.	2010	full	365
2	1-Oct-16	31-Dec-16	.	.	.	.	.	.	.	.	2016	partial	90
3	1-Jan-14	28-Feb-14	1-Apr-14	31-Jul-14	.	.	.	.	.	.	2014	partial	178
4	1-Jan-02	31-Dec-16	.	.	.	.	.	.	.	.	2017	none	
5	1-Feb-04	28-Feb-06	1-Apr-07	31-May-10	1-Nov-10	31-Aug-13	1-Oct-14	30-Sep-15	1-Mar-16	31-Dec-16	2016	partial	301
6	1-Jan-02	31-Aug-15	1-Dec-15	31-Dec-16	.	.	.	.	.	.	2015	partial	271
7	1-Mar-04	28-Feb-05	1-Apr-06	31-May-06	1-Jan-13	30-Jun-16	.	.	.	.	2005	partial	58
8	1-Jan-08	31-Oct-08	1-Dec-08	31-Jan-09	1-May-09	31-Oct-13	.	.	.	.	2013	partial	300
;

The story behind this question. Feel free to skip if not necessary. 

 

Similar but slightly different version of this problem was solved by @mkeintz previously. 

https://communities.sas.com/t5/SAS-Programming/Select-the-cases-which-included-the-full-calendar-yea...

 

Below is how problem was solved to calculate the total number of days with insurance coverage in the year of diagnosis and one year after to capture Jan-Dec transition months. 

 

DATA HAVE; SET HAVE; 
FORMAT DATE_DIAGNOSIS BEST5.;
RUN; 

DATA HAVE1(DROP=DATE_DEATH); SET HAVE; 
y=year(DATE_DIAGNOSIS);
m=month(DATE_DIAGNOSIS);
d=day(DATE_DIAGNOSIS);
y1=y+1;
year_later=mdy(m,d,y1);
drop y y1 m d;
format year_later mmddyy10.;
run;

data HAVE2; set HAVE1;
array START(35);
array END(35);
array elig_days_(35);
do i=1 to 35;
elig_days_(i)=END(i) - START(i);
if END(i)< DATE_DIAGNOSIS  then elig_days_(i)=0;
else if START(i)< DATE_DIAGNOSIS and START(i)< year_later then elig_days_(i)=END(i) - DATE_DIAGNOSIS;
end;
elig_total=sum(of elig_days_1 - elig_days_35);
drop i;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

How do you get for Id=4 ndays=5.

 

Changing your example data variables is not optimal to having a good response.

Just posted this in the OTHER thread on the same topic

data example;
   set have;
   array s start: ;
   array e end: ;
   do i= 1 to dim(s);
      if year(s[i]) le year(date_diagnosis) le year(e[i]) then do;
         /* some values of the interval are within the year of diagnosis so do this*/
         /* the 1 at the end of the sum is because the subtraction between dates gives the number of day intervales
            and need to add 1 to get the actual number of Days*/
         yeardays = sum (yeardays,min(e[i],mdy(12,31,year(date_diagnosis)) ) - max(s[i],mdy(1,1,year(date_diagnosis)) ),1);
      end;
   end;
run;

 

 

 

View solution in original post

6 REPLIES 6
ballardw
Super User

How do you get for Id=4 ndays=5.

 

Changing your example data variables is not optimal to having a good response.

Just posted this in the OTHER thread on the same topic

data example;
   set have;
   array s start: ;
   array e end: ;
   do i= 1 to dim(s);
      if year(s[i]) le year(date_diagnosis) le year(e[i]) then do;
         /* some values of the interval are within the year of diagnosis so do this*/
         /* the 1 at the end of the sum is because the subtraction between dates gives the number of day intervales
            and need to add 1 to get the actual number of Days*/
         yeardays = sum (yeardays,min(e[i],mdy(12,31,year(date_diagnosis)) ) - max(s[i],mdy(1,1,year(date_diagnosis)) ),1);
      end;
   end;
run;

 

 

 

Cruise
Ammonite | Level 13

@ballardw 

It appears that your algorithm worked well! Thank you so much. Date intervals are in right order and I'll make sure no missing. Other than that, please let me know if there's any alternative scenarios where algorithm may fail.

worked_ballard.png

ballardw
Super User

@Cruise wrote:

@ballardw 

It appears that your algorithm worked well! Thank you so much. Date intervals are in right order and I'll make sure no missing. Other than that, please let me know if there's any alternative scenarios where algorithm may fail.

worked_ballard.png


Depends on what you may mean by "fail".

It totals number of days in the provided intervals. If intervals overlap you will get some days counted twice as each start/end interval is added separately. Multiple entries of the same interval count as overlaps. If there a gaps within a year such as one interval ends in Feb and another later starts in June then no information about that gap is available. So if you need information about contiguous coverage that is another question.

I don't think there is even an issue with missing all intervals, you just get messages about operations on missing values. If you don't want that a check in the start of the loop such as "If missing(s[I]) then leave; before any of the interval checking is done would terminate the loop for a record.

Cruise
Ammonite | Level 13

Thanks for thinking through. there are no overlapping intervals, multiple entries of same intervals and no list-wise missing. The only nuance is my data is not a distinct ID per row. IDs are repeated so are interval datas. However, it seems that doesn't affect the algorithm at all. 

Cruise
Ammonite | Level 13

Hi folks and @ballardw ,

would you be interested in extending your help in additionally defining an insurance coverage whether

-terminated by death preceded with interrupted coverage;

-terminated by death preceded with continuous coverage in the year of diagnosis? Shall I create a new forum on this question?

 

DATA HAVE; 
INPUT ID date_death start1	end1 start2	end2	start3	end3	start4	end4	start5	end5 dx_year coverage $ N_days;
Informat date_death start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 anydtdte.;
format date_death start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 date9.;
CARDS;
1 .	1-Jan-02	30-Apr-14	1-Jun-14	31-Jul-15	.	.	.	.	.	.	2010	full	365
2 .	1-Oct-16	31-Dec-16	.	.	.	.	.	.	.	.	2016	partial	90
3 .	1-Jan-14	28-Feb-14	1-Apr-14	31-Jul-14	.	.	.	.	.	.	2014	partial	178
4 .	1-Jan-02	31-Dec-16	.	.	.	.	.	.	.	.	2017	none .	
5 16-Jan-17 	1-Feb-04	28-Feb-06	1-Apr-07	31-May-10	1-Nov-10	31-Aug-13	1-Oct-14	30-Sep-15	1-Mar-16	31-Dec-16	2016	full	301
6 15-Dec-16 	1-Jan-02	31-Aug-15	1-Dec-15	31-Dec-16	.	.	.	.	.	.	2015	full	271
7 10-Jun-16	    1-Mar-04	28-Feb-05	1-Apr-06	31-May-06	1-Jan-13	30-Jun-16	.	.	.	.	2005	full	58
8 31-Nov-13     1-Jan-08	31-Oct-08	1-Dec-08	31-Jan-09	1-May-09	31-Oct-13	.	.	.	.	2013	full	300
;
ballardw
Super User

@Cruise wrote:

Hi folks and @ballardw ,

would you be interested in extending your help in additionally defining an insurance coverage whether

-terminated by death preceded with interrupted coverage;

-terminated by death preceded with continuous coverage in the year of diagnosis? Shall I create a new forum on this question?

 


It would probably be appropriate to start another thread though including a reference to this one might be useful.

 

You would have to describe the rules in more detail as to what exactly determines the condition and what the output data set should look like.

Definitions of things like "interrupted coverage" and "continuous coverage" should be carefully defined as my interpretation might not match yours. Also "in the year" may need clarification as within the same calendar year or 365/366 days prior or even a fiscal or planning year between specific dates.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1712 views
  • 2 likes
  • 2 in conversation