Hi folks,
My data has
- Date pairs (elig_beg and elig_end) for varying length of time intervals to mark insurance coverage periods. Everybody in this dataset has insurance, but, on and off basis. For example: 01/05/2005(elig_beg1)-01/08/2005(elig_end1) and 01/01/2007(elig_beg2)-01/01/2010(elig_end2). This patient had two coverage intervals. First coverage lasted 3 months and the last coverage lasted 3 years.
- Date_diagnosis : date of diagnosis of a disease
- ID: unique patient identifier
- Eligibility data in separate columns (elig_beg1 thru 50 and elig_end1 thru 50) and unique for each patients.
My goal is to output the date pair that included date_diagnosis as shown in the image below for the first row of "have" demo data.
For instance: 10DEC2004 fell in the date pair: elig_beg/elig_end1, thus, desired output is two variables such as "cov_beg/cov_end". If I identify this date pair (cov_beg and cov-end) that included the date of diagnosis I will be able to calculate the length of the coverage period of insurance that included date_diagnosis in it.
DATA HAVE;
INPUT ID DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2;
CARDS;
1 16415 16376 17713 17806 18808
2 16419 16376 17836 . .
3 16428 16376 17836 17867 20819
4 16429 16376 16617 . .
5 16771 16741 20850 . .
6 16792 16741 20819 . .
7 17153 17106 17286 . .
8 17157 17106 17256 . .
9 18235 18202 18352 . .
10 18234 18202 20819 21124 21184
11 18597 18567 18839 19663 20027
12 18624 18567 18992 . .
13 18605 18567 18961 19175 19205
14 18974 18932 19144 . .
15 18963 18932 19631 . .
16 18991 18932 19052 . .
17 19339 19298 21184 . .
18 19353 19298 19389 . .
19 19331 19298 19631 . .
20 19716 19663 20819 . .
21 19345 19298 19601 . .
22 20075 20028 20819 20851 20909
23 20065 20028 20361 . .
;
Hi @Cruise
I think it can be done in the first step with a slight modification of the loop, thus eliminating the intermediate data set and the 3 following steps. It seems that all diagnosis dates fall within the first interval in your example data, so I took the liberty to change a couple of diagnosis dates to test the code.
DATA have;
format id 8. DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2 date9.;
informat DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2 date9.;
input ID DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2;
cards;
1 10DEC2004 01NOV2004 30JUN2008 01OCT2008 30JUN2011
2 14DEC2004 01NOV2004 31OCT2008 . .
3 01DEC2008 01NOV2004 31OCT2008 01DEC2008 31DEC2016
4 24DEC2004 01NOV2004 30JUN2005 . .
5 01DEC2004 01NOV2005 31JAN2017 . .
6 22DEC2005 01NOV2005 31DEC2016 . .
7 18DEC2006 01NOV2006 30APR2007 . .
8 22DEC2006 01NOV2006 31MAR2007 . .
9 04DEC2009 01NOV2009 31MAR2010 . .
10 03DEC2009 01NOV2009 31DEC2016 01NOV2017 31DEC2017
11 01DEC2010 01NOV2010 31JUL2011 01NOV2013 31OCT2014
12 28DEC2010 01NOV2010 31DEC2011 . .
13 09DEC2010 01NOV2010 30NOV2011 01JUL2012 31JUL2012
14 13DEC2011 01NOV2011 31MAY2012 . .
15 02DEC2011 01NOV2011 30SEP2013 . .
16 30DEC2011 01NOV2011 29FEB2012 . .
17 12DEC2012 01NOV2012 31DEC2017 . .
18 26DEC2012 01NOV2012 31JAN2013 . .
19 04DEC2012 01NOV2012 30SEP2013 . .
20 24DEC2013 01NOV2013 31DEC2016 . .
21 18DEC2012 01NOV2012 31AUG2013 . .
22 18DEC2014 01NOV2014 31DEC2016 01FEB2017 31MAR2017
23 08DEC2014 01NOV2014 30SEP2015 . .
;
run;
data want (drop=i); set have;
array beg {*} elig_beg:;
array end {*} elig_end:;
format cov_beg cov_end date9.;
do i = 1 to dim(beg);
if missing(beg{i}) then leave;
if beg{i} <= DATE_DIAGNOSIS <= end{i} then do;
cov_beg = beg{i};
cov_end = end{i};
cov_days = cov_end - cov_beg + 1;
leave;
end;
end;
run;
What happens in leap years?
"Approximate rounding" isn't a sufficient definition to code from. To be more precise you could ignore 29 Feb if it falls within any of your date pairs.
I think, I solved the problem using Patrick's solution provided in the earlier forum. @Patrick
data P.long(drop=_: elig_beg: elig_end:); /*NROW=19,931*/ set MY_ACTUAL_DATA; /*NROW=19,931, N_SUBJ=10,341*/
array _elig_beg {*} elig_beg:;
array _end {*} elig_end:;
attrib insurance_elig_beg_dt insurance_end_dt format=date9.;
do _i=1 to dim(_elig_beg);
if missing(_elig_beg[_i]) then continue;
insurance_elig_beg_dt=_elig_beg[_i];
insurance_end_dt=_end[_i];
output;
end;
run;
DATA P.long1; SET P.long; /*NROW=19,931, N_SUBJ=10,341*/
IF insurance_elig_beg_dt<=DATE_DIAGNOSIS<=insurance_end_dt THEN COV=1; ELSE COV=0;
RUN;
Proc sort data=P.long1;
By DISPLAY_ID COV;
Run;
Data P.long2; Set P.long1;
By DISPLAY_ID;
If last.DISPLAY_ID;
Run;
Hi @Cruise
I think it can be done in the first step with a slight modification of the loop, thus eliminating the intermediate data set and the 3 following steps. It seems that all diagnosis dates fall within the first interval in your example data, so I took the liberty to change a couple of diagnosis dates to test the code.
DATA have;
format id 8. DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2 date9.;
informat DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2 date9.;
input ID DATE_DIAGNOSIS elig_beg1 elig_end1 elig_beg2 elig_end2;
cards;
1 10DEC2004 01NOV2004 30JUN2008 01OCT2008 30JUN2011
2 14DEC2004 01NOV2004 31OCT2008 . .
3 01DEC2008 01NOV2004 31OCT2008 01DEC2008 31DEC2016
4 24DEC2004 01NOV2004 30JUN2005 . .
5 01DEC2004 01NOV2005 31JAN2017 . .
6 22DEC2005 01NOV2005 31DEC2016 . .
7 18DEC2006 01NOV2006 30APR2007 . .
8 22DEC2006 01NOV2006 31MAR2007 . .
9 04DEC2009 01NOV2009 31MAR2010 . .
10 03DEC2009 01NOV2009 31DEC2016 01NOV2017 31DEC2017
11 01DEC2010 01NOV2010 31JUL2011 01NOV2013 31OCT2014
12 28DEC2010 01NOV2010 31DEC2011 . .
13 09DEC2010 01NOV2010 30NOV2011 01JUL2012 31JUL2012
14 13DEC2011 01NOV2011 31MAY2012 . .
15 02DEC2011 01NOV2011 30SEP2013 . .
16 30DEC2011 01NOV2011 29FEB2012 . .
17 12DEC2012 01NOV2012 31DEC2017 . .
18 26DEC2012 01NOV2012 31JAN2013 . .
19 04DEC2012 01NOV2012 30SEP2013 . .
20 24DEC2013 01NOV2013 31DEC2016 . .
21 18DEC2012 01NOV2012 31AUG2013 . .
22 18DEC2014 01NOV2014 31DEC2016 01FEB2017 31MAR2017
23 08DEC2014 01NOV2014 30SEP2015 . .
;
run;
data want (drop=i); set have;
array beg {*} elig_beg:;
array end {*} elig_end:;
format cov_beg cov_end date9.;
do i = 1 to dim(beg);
if missing(beg{i}) then leave;
if beg{i} <= DATE_DIAGNOSIS <= end{i} then do;
cov_beg = beg{i};
cov_end = end{i};
cov_days = cov_end - cov_beg + 1;
leave;
end;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.