Hi folks,
I'd like to create 12 dummy variables each indicates the total number of months of insurance coverage. I greatly appreciate your time in advance. My eligibility data is organized in pair columns where each intervals are indicated by startn1-endn1, startn2-endn2.and forth. My actual data has extended intervals of startn75-endn75.
month_coverage12 (0/1)
month_coverage11 (0/1)
month_coverage10 (0/1)
month_coverage9 (0/1)
month_coverage8 (0/1)
month_coverage7 (0/1)
month_coverage6 (0/1)
month_coverage5 (0/1)
month_coverage4 (0/1)
month_coverage3 (0/1)
month_coverage2 (0/1)
month_coverage1 (0/1)
3 month coverage (0/1)
2 month coverage (0/1)
1 month coverage (0/1)
DATA P.ELIGIBILITY;
INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP $;
Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
CARDS;
1 1-Jan-02 30-Apr-14 1-Jun-14 31-Jul-15 . . . . . . 1-Dec-10 25-Jul-15 2010 1
2 1-Oct-05 31-Dec-16 . . . . . . . . 14-Jan-08 . 2008 1
3 1-Dec-11 28-Feb-14 . . . . . . . . 24-Oct-13 22-Feb-14 2013 1
4 1-Jan-02 31-Dec-16 . . . . . . . . 15-Nov-06 . 2006 1
5 1-Feb-04 28-Feb-06 1-Apr-07 31-May-10 1-Nov-10 31-Aug-13 1-Oct-14 30-Sep-15 1-Mar-16 31-Dec-16 4-Jan-06 . 2006 0
6 1-Jan-02 31-Aug-15 1-Dec-15 31-Dec-16 . . . . . . 28-Apr-11 . 2011 1
7 1-Mar-04 28-Feb-05 1-Apr-06 31-Mar-08 1-Jan-13 30-Jun-16 . . . . 18-Aug-14 . 2014 1
8 1-Jan-02 31-Oct-08 1-Dec-08 31-Jan-09 1-May-09 31-Oct-13 . . . . 7-Mar-13 23-Oct-13 2013 1
9 1-Jan-02 31-Dec-16 . . . . . . . . 27-Feb-09 . 2009 1
10 1-Jan-02 31-Dec-11 . . . . . . . . 10-Sep-10 17-Dec-11 2010 1
11 1-Sep-12 30-Jun-15 . . . . . . . . 7-Jun-13 . 2013 1
12 1-Jan-02 30-Sep-08 . . . . . . . . 11-Nov-05 18-Sep-08 2005 1
13 1-Jan-02 31-May-09 . . . . . . . . 7-Jun-07 8-Dec-08 2007 1
14 1-Jan-02 31-May-10 . . . . . . . . 10-Oct-06 10-May-10 2006 1
15 1-Jan-02 31-Dec-16 . . . . . . . . 21-Jan-14 . 2014 1
16 1-Jan-02 31-Dec-04 1-Feb-09 30-Apr-10 1-Jul-10 31-Oct-10 1-Nov-12 31-Dec-16 . . 23-Feb-09 . 2009 0
17 1-May-05 31-Jul-05 . . . . . . . . 1-Mar-05 16-Jul-05 2005 0
18 1-Sep-05 30-Sep-05 . . . . . . . . 29-Jun-05 22-Sep-05 2005 0
19 1-Dec-04 30-Sep-06 . . . . . . . . 2-Jan-05 16-Sep-06 2005 1
20 1-Apr-05 30-Apr-07 . . . . . . . . 3-Jan-05 2-Jan-11 2005 0
21 1-Feb-05 31-Oct-05 . . . . . . . . 3-Jan-05 . 2005 0
22 1-Jan-05 30-Sep-05 . . . . . . . . 3-Jan-05 20-Sep-05 2005 1
23 1-Apr-04 31-Jul-04 1-Jan-05 30-Jun-05 1-Aug-05 31-Dec-06 . . . . 4-Jan-05 10-Apr-06 2005 0
24 1-Sep-05 30-Nov-05 . . . . . . . . 5-Jan-05 7-Oct-05 2005 0
25 1-Jul-05 31-Dec-05 . . . . . . . . 7-Jan-05 15-Jun-07 2005 0
26 1-Apr-02 30-Sep-06 1-Dec-14 29-Feb-16 . . . . . . 7-Jan-05 17-Oct-15 2005 1
27 1-Apr-03 31-Aug-07 . . . . . . . . 10-Jan-05 25-Aug-07 2005 1
28 1-Jan-05 31-Mar-11 . . . . . . . . 10-Jan-05 5-Feb-11 2005 1
29 1-Jan-04 28-Feb-07 1-May-07 28-Feb-13 . . . . . . 10-Jan-05 . 2005 1
30 1-Dec-04 28-Feb-07 . . . . . . . . 11-Jan-05 9-Feb-07 2005 1
31 1-Jan-02 31-May-02 1-Jul-02 28-Feb-03 1-May-03 31-Dec-03 1-Mar-04 30-Nov-04 1-Jan-05 31-Mar-06 12-Jan-05 14-Jan-06 2005 1
32 1-Jan-02 31-Oct-06 1-May-08 31-Jul-10 1-Nov-10 31-Dec-16 . . . . 12-Jan-05 . 2005 1
33 1-Jan-02 31-Oct-08 . . . . . . . . 12-Jan-05 9-Oct-08 2005 1
34 1-Jan-02 31-Dec-16 . . . . . . . . 13-Jan-05 . 2005 1
35 1-Jan-05 30-Jun-12 . . . . . . . . 1-Feb-05 28-Jun-12 2005 1
36 1-Jun-04 30-Jun-05 . . . . . . . . 13-Jan-05 26-Jan-06 2005 0
37 1-Nov-02 31-Jan-07 1-Mar-07 31-Dec-16 . . . . . . 13-Jan-05 . 2005 1
38 1-Jun-02 31-Dec-16 . . . . . . . . 15-Jan-05 . 2005 1
39 1-Jan-02 31-Aug-06 . . . . . . . . 15-Jan-05 24-Jul-06 2005 1
40 1-Jun-05 31-May-06 1-Dec-13 31-Jan-16 . . . . . . 15-Jan-05 . 2005 0
;
See if this comes close to what you need for the number of days.
data example; set have; array s start: ; array e end: ; do i= 1 to dim(s); if year(s[i]) le year(date_diagnosis) le year(e[i]) then do; /* some values of the interval are within the year of diagnosis so do this*/ /* the 1 at the end of the sum is because the subtraction between dates gives the number of day intervales and need to add 1 to get the actual number of Days*/ yeardays = sum (yeardays,min(e[i],mdy(12,31,year(date_diagnosis)) ) - max(s[i],mdy(1,1,year(date_diagnosis)) ),1); end; end; run;
And actually this would not care if the sequence of dates is out of order or not. Some efficiency could be gained if we know that the data is filled left to right and the first missing start date could end the loop.
@Cruise wrote:
Hi folks,
I'd like to create 12 dummy variables each indicates the total number of months of insurance coverage. I greatly appreciate your time in advance. My eligibility data is organized in pair columns where each intervals are indicated by startn1-endn1, startn2-endn2.and forth. My actual data has extended intervals of startn75-endn75.
month_coverage12 (0/1)
month_coverage11 (0/1)
month_coverage10 (0/1)
month_coverage9 (0/1)
month_coverage8 (0/1)
month_coverage7 (0/1)
month_coverage6 (0/1)
month_coverage5 (0/1)
month_coverage4 (0/1)
month_coverage3 (0/1)
month_coverage2 (0/1)
month_coverage1 (0/1)
3 month coverage (0/1)
2 month coverage (0/1)
1 month coverage (0/1)
And what are the rules for assigning any of those indicator variables? Does month_coverage12 (0/1) mean that at least one of the pairs has an interval of at least 12 months? The last defined pair has an interval of 12 months? The shortest interval of any of the pairs has an interval of 12 months? I suspect I could come up with a few more interpretations.
And how will month_coverage3 (0/1) differ from "3 month coverage"? and what actual variable name is going to hold "3 month coverage" as that isn't a normally valid SAS variable name?
Since you say you have 75 pairs of these please provide some hint as to how to use the multiple pairs.
And do the other variables in your example such as Date_diagnosis, Date_death or Year have any involvement in the assignment?
The question must be asked ... do you really need dummy variables at all? Normally, life is so much simpler with you programming them, most SAS procedures can create the dummy variables internally so you don't have to create them yourself.
your question leads me to my current solution for creating a continuous variable which is the total n of days that each subjects had an insurance coverage. I wasn't sure though, whether this solution helps consider transition intervals adequately , ie., 2005 Dec to 2006 Jan. Can anybody review this solution given that I'm only interested in knowing the duration of health insurance without a concern of gaps in between. I mean, Jan covered, Mar covered. But Feb was not covered. That's fine. Just give me 60 days then. That's what I intended by the last column created in this code below.
DATA HAVE;
INPUT ID start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH YEAR KEEP $;
Informat start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH anydtdte.;
format start1 end1 start2 end2 start3 end3 start4 end4 start5 end5 DATE_DIAGNOSIS DATE_DEATH date9.;
CARDS;
1 1-Jan-02 30-Apr-14 1-Jun-14 31-Jul-15 . . . . . . 1-Dec-10 25-Jul-15 2010 1
2 1-Oct-05 31-Dec-16 . . . . . . . . 14-Jan-08 . 2008 1
3 1-Dec-11 28-Feb-14 . . . . . . . . 24-Oct-13 22-Feb-14 2013 1
4 1-Jan-02 31-Dec-16 . . . . . . . . 15-Nov-06 . 2006 1
5 1-Feb-04 28-Feb-06 1-Apr-07 31-May-10 1-Nov-10 31-Aug-13 1-Oct-14 30-Sep-15 1-Mar-16 31-Dec-16 4-Jan-06 . 2006 0
6 1-Jan-02 31-Aug-15 1-Dec-15 31-Dec-16 . . . . . . 28-Apr-11 . 2011 1
7 1-Mar-04 28-Feb-05 1-Apr-06 31-Mar-08 1-Jan-13 30-Jun-16 . . . . 18-Aug-14 . 2014 1
8 1-Jan-02 31-Oct-08 1-Dec-08 31-Jan-09 1-May-09 31-Oct-13 . . . . 7-Mar-13 23-Oct-13 2013 1
9 1-Jan-02 31-Dec-16 . . . . . . . . 27-Feb-09 . 2009 1
10 1-Jan-02 31-Dec-11 . . . . . . . . 10-Sep-10 17-Dec-11 2010 1
11 1-Sep-12 30-Jun-15 . . . . . . . . 7-Jun-13 . 2013 1
12 1-Jan-02 30-Sep-08 . . . . . . . . 11-Nov-05 18-Sep-08 2005 1
13 1-Jan-02 31-May-09 . . . . . . . . 7-Jun-07 8-Dec-08 2007 1
14 1-Jan-02 31-May-10 . . . . . . . . 10-Oct-06 10-May-10 2006 1
15 1-Jan-02 31-Dec-16 . . . . . . . . 21-Jan-14 . 2014 1
16 1-Jan-02 31-Dec-04 1-Feb-09 30-Apr-10 1-Jul-10 31-Oct-10 1-Nov-12 31-Dec-16 . . 23-Feb-09 . 2009 0
17 1-May-05 31-Jul-05 . . . . . . . . 1-Mar-05 16-Jul-05 2005 0
18 1-Sep-05 30-Sep-05 . . . . . . . . 29-Jun-05 22-Sep-05 2005 0
19 1-Dec-04 30-Sep-06 . . . . . . . . 2-Jan-05 16-Sep-06 2005 1
20 1-Apr-05 30-Apr-07 . . . . . . . . 3-Jan-05 2-Jan-11 2005 0
21 1-Feb-05 31-Oct-05 . . . . . . . . 3-Jan-05 . 2005 0
22 1-Jan-05 30-Sep-05 . . . . . . . . 3-Jan-05 20-Sep-05 2005 1
23 1-Apr-04 31-Jul-04 1-Jan-05 30-Jun-05 1-Aug-05 31-Dec-06 . . . . 4-Jan-05 10-Apr-06 2005 0
24 1-Sep-05 30-Nov-05 . . . . . . . . 5-Jan-05 7-Oct-05 2005 0
25 1-Jul-05 31-Dec-05 . . . . . . . . 7-Jan-05 15-Jun-07 2005 0
26 1-Apr-02 30-Sep-06 1-Dec-14 29-Feb-16 . . . . . . 7-Jan-05 17-Oct-15 2005 1
27 1-Apr-03 31-Aug-07 . . . . . . . . 10-Jan-05 25-Aug-07 2005 1
28 1-Jan-05 31-Mar-11 . . . . . . . . 10-Jan-05 5-Feb-11 2005 1
29 1-Jan-04 28-Feb-07 1-May-07 28-Feb-13 . . . . . . 10-Jan-05 . 2005 1
30 1-Dec-04 28-Feb-07 . . . . . . . . 11-Jan-05 9-Feb-07 2005 1
31 1-Jan-02 31-May-02 1-Jul-02 28-Feb-03 1-May-03 31-Dec-03 1-Mar-04 30-Nov-04 1-Jan-05 31-Mar-06 12-Jan-05 14-Jan-06 2005 1
32 1-Jan-02 31-Oct-06 1-May-08 31-Jul-10 1-Nov-10 31-Dec-16 . . . . 12-Jan-05 . 2005 1
33 1-Jan-02 31-Oct-08 . . . . . . . . 12-Jan-05 9-Oct-08 2005 1
34 1-Jan-02 31-Dec-16 . . . . . . . . 13-Jan-05 . 2005 1
35 1-Jan-05 30-Jun-12 . . . . . . . . 1-Feb-05 28-Jun-12 2005 1
36 1-Jun-04 30-Jun-05 . . . . . . . . 13-Jan-05 26-Jan-06 2005 0
37 1-Nov-02 31-Jan-07 1-Mar-07 31-Dec-16 . . . . . . 13-Jan-05 . 2005 1
38 1-Jun-02 31-Dec-16 . . . . . . . . 15-Jan-05 . 2005 1
39 1-Jan-02 31-Aug-06 . . . . . . . . 15-Jan-05 24-Jul-06 2005 1
40 1-Jun-05 31-May-06 1-Dec-13 31-Jan-16 . . . . . . 15-Jan-05 . 2005 0
;
DATA HAVE; SET HAVE;
FORMAT DATE_DIAGNOSIS BEST5.;
RUN;
DATA HAVE1(DROP=DATE_DEATH); SET HAVE;
y=year(DATE_DIAGNOSIS);
m=month(DATE_DIAGNOSIS);
d=day(DATE_DIAGNOSIS);
y1=y+1;
year_later=mdy(m,d,y1);
drop y y1 m d;
format year_later mmddyy10.;
run;
data HAVE2; set HAVE1;
array START(35);
array END(35);
array elig_days_(35);
do i=1 to 35;
elig_days_(i)=END(i) - START(i);
if END(i)< DATE_DIAGNOSIS then elig_days_(i)=0;
else if START(i)< DATE_DIAGNOSIS and START(i)< year_later then elig_days_(i)=END(i) - DATE_DIAGNOSIS;
end;
elig_total=sum(of elig_days_1 - elig_days_35);
drop i;
run;
Jan covered, Mar covered. But Feb was not covered. That's fine. Just give me 60 days then. That's what I intended by the last column created in this code below.
Hard to see where dummy variables would be useful in this case.
I apologize for creating a forum before I understand what I want. Questions asked really pointed me to the right direction. Now, I understand what I really need was: number of days with insurance coverage in the year of diagnosis (DATE_DIAGNOSIS). Shall I create a separate forum on this?
Please let me know your suggestion.
See if this comes close to what you need for the number of days.
data example; set have; array s start: ; array e end: ; do i= 1 to dim(s); if year(s[i]) le year(date_diagnosis) le year(e[i]) then do; /* some values of the interval are within the year of diagnosis so do this*/ /* the 1 at the end of the sum is because the subtraction between dates gives the number of day intervales and need to add 1 to get the actual number of Days*/ yeardays = sum (yeardays,min(e[i],mdy(12,31,year(date_diagnosis)) ) - max(s[i],mdy(1,1,year(date_diagnosis)) ),1); end; end; run;
And actually this would not care if the sequence of dates is out of order or not. Some efficiency could be gained if we know that the data is filled left to right and the first missing start date could end the loop.
Panel histograms?
You mean like a panel for 30 days and a panel for 60 days and a panel for 90 days and so on? I don't see how dummy variables help in this case.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.