Hi folks,
I'm trying to figure out: How many days in the year of diagnosis was covered by the insurance?
I'd like to create coverage and N-days variables as shown in the image below. n_days is not precise accounting for the varying 28,30,31 end points across different months. However, the idea is to count the number of days covered by the insurance based on the eligibility data identified by the startN-endN interval pairs. Let me explain a little bit.
Full coverage for ID1 because diagnosis year (dx_year) is contained in the start1-end1 interval.
Partial coverage for ID2 because month of November is not covered in 2016.
Partial coverage for ID5 because start1-end1 covers only Jan thru Aug. Apparently missing coverage in Sep, Oct, Nov until it starts at startN which covers the month of Dec of 2015. In total then the number of days covered is 271. Again, I manually calculated N-days. Please excuse me for the rough values. My actual data set has 50K patients with time intervals each. There might be different scenarios that I haven't thought through yet.
DATA COVERAGE;
INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 dx_year coverage $ N_days;
Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 anydtdte.;
format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 date9.;
CARDS;
1 1-Jan-02 30-Apr-14 1-Jun-14 31-Jul-15 . . . . . . 2010 full 365
2 1-Oct-16 31-Dec-16 . . . . . . . . 2016 partial 90
3 1-Jan-14 28-Feb-14 1-Apr-14 31-Jul-14 . . . . . . 2014 partial 178
4 1-Jan-02 31-Dec-16 . . . . . . . . 2017 none
5 1-Feb-04 28-Feb-06 1-Apr-07 31-May-10 1-Nov-10 31-Aug-13 1-Oct-14 30-Sep-15 1-Mar-16 31-Dec-16 2016 partial 301
6 1-Jan-02 31-Aug-15 1-Dec-15 31-Dec-16 . . . . . . 2015 partial 271
7 1-Mar-04 28-Feb-05 1-Apr-06 31-May-06 1-Jan-13 30-Jun-16 . . . . 2005 partial 58
8 1-Jan-08 31-Oct-08 1-Dec-08 31-Jan-09 1-May-09 31-Oct-13 . . . . 2013 partial 300
;
The story behind this question. Feel free to skip if not necessary.
Similar but slightly different version of this problem was solved by @mkeintz previously.
Below is how problem was solved to calculate the total number of days with insurance coverage in the year of diagnosis and one year after to capture Jan-Dec transition months.
DATA HAVE; SET HAVE;
FORMAT DATE_DIAGNOSIS BEST5.;
RUN;
DATA HAVE1(DROP=DATE_DEATH); SET HAVE;
y=year(DATE_DIAGNOSIS);
m=month(DATE_DIAGNOSIS);
d=day(DATE_DIAGNOSIS);
y1=y+1;
year_later=mdy(m,d,y1);
drop y y1 m d;
format year_later mmddyy10.;
run;
data HAVE2; set HAVE1;
array START(35);
array END(35);
array elig_days_(35);
do i=1 to 35;
elig_days_(i)=END(i) - START(i);
if END(i)< DATE_DIAGNOSIS then elig_days_(i)=0;
else if START(i)< DATE_DIAGNOSIS and START(i)< year_later then elig_days_(i)=END(i) - DATE_DIAGNOSIS;
end;
elig_total=sum(of elig_days_1 - elig_days_35);
drop i;
run;
How do you get for Id=4 ndays=5.
Changing your example data variables is not optimal to having a good response.
Just posted this in the OTHER thread on the same topic
data example; set have; array s start: ; array e end: ; do i= 1 to dim(s); if year(s[i]) le year(date_diagnosis) le year(e[i]) then do; /* some values of the interval are within the year of diagnosis so do this*/ /* the 1 at the end of the sum is because the subtraction between dates gives the number of day intervales and need to add 1 to get the actual number of Days*/ yeardays = sum (yeardays,min(e[i],mdy(12,31,year(date_diagnosis)) ) - max(s[i],mdy(1,1,year(date_diagnosis)) ),1); end; end; run;
How do you get for Id=4 ndays=5.
Changing your example data variables is not optimal to having a good response.
Just posted this in the OTHER thread on the same topic
data example; set have; array s start: ; array e end: ; do i= 1 to dim(s); if year(s[i]) le year(date_diagnosis) le year(e[i]) then do; /* some values of the interval are within the year of diagnosis so do this*/ /* the 1 at the end of the sum is because the subtraction between dates gives the number of day intervales and need to add 1 to get the actual number of Days*/ yeardays = sum (yeardays,min(e[i],mdy(12,31,year(date_diagnosis)) ) - max(s[i],mdy(1,1,year(date_diagnosis)) ),1); end; end; run;
It appears that your algorithm worked well! Thank you so much. Date intervals are in right order and I'll make sure no missing. Other than that, please let me know if there's any alternative scenarios where algorithm may fail.
@Cruise wrote:
It appears that your algorithm worked well! Thank you so much. Date intervals are in right order and I'll make sure no missing. Other than that, please let me know if there's any alternative scenarios where algorithm may fail.
Depends on what you may mean by "fail".
It totals number of days in the provided intervals. If intervals overlap you will get some days counted twice as each start/end interval is added separately. Multiple entries of the same interval count as overlaps. If there a gaps within a year such as one interval ends in Feb and another later starts in June then no information about that gap is available. So if you need information about contiguous coverage that is another question.
I don't think there is even an issue with missing all intervals, you just get messages about operations on missing values. If you don't want that a check in the start of the loop such as "If missing(s[I]) then leave; before any of the interval checking is done would terminate the loop for a record.
Thanks for thinking through. there are no overlapping intervals, multiple entries of same intervals and no list-wise missing. The only nuance is my data is not a distinct ID per row. IDs are repeated so are interval datas. However, it seems that doesn't affect the algorithm at all.
Hi folks and @ballardw ,
would you be interested in extending your help in additionally defining an insurance coverage whether
-terminated by death preceded with interrupted coverage;
-terminated by death preceded with continuous coverage in the year of diagnosis? Shall I create a new forum on this question?
DATA HAVE;
INPUT ID date_death start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 dx_year coverage $ N_days;
Informat date_death start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 anydtdte.;
format date_death start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 date9.;
CARDS;
1 . 1-Jan-02 30-Apr-14 1-Jun-14 31-Jul-15 . . . . . . 2010 full 365
2 . 1-Oct-16 31-Dec-16 . . . . . . . . 2016 partial 90
3 . 1-Jan-14 28-Feb-14 1-Apr-14 31-Jul-14 . . . . . . 2014 partial 178
4 . 1-Jan-02 31-Dec-16 . . . . . . . . 2017 none .
5 16-Jan-17 1-Feb-04 28-Feb-06 1-Apr-07 31-May-10 1-Nov-10 31-Aug-13 1-Oct-14 30-Sep-15 1-Mar-16 31-Dec-16 2016 full 301
6 15-Dec-16 1-Jan-02 31-Aug-15 1-Dec-15 31-Dec-16 . . . . . . 2015 full 271
7 10-Jun-16 1-Mar-04 28-Feb-05 1-Apr-06 31-May-06 1-Jan-13 30-Jun-16 . . . . 2005 full 58
8 31-Nov-13 1-Jan-08 31-Oct-08 1-Dec-08 31-Jan-09 1-May-09 31-Oct-13 . . . . 2013 full 300
;
@Cruise wrote:
Hi folks and @ballardw ,
would you be interested in extending your help in additionally defining an insurance coverage whether
-terminated by death preceded with interrupted coverage;
-terminated by death preceded with continuous coverage in the year of diagnosis? Shall I create a new forum on this question?
It would probably be appropriate to start another thread though including a reference to this one might be useful.
You would have to describe the rules in more detail as to what exactly determines the condition and what the output data set should look like.
Definitions of things like "interrupted coverage" and "continuous coverage" should be carefully defined as my interpretation might not match yours. Also "in the year" may need clarification as within the same calendar year or 365/366 days prior or even a fiscal or planning year between specific dates.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.