Hi All,
I am trying to count the number of enrollment months based on enrollment days and deceased dates. The example data is as attached, and the last column Enrolled_Months is the target field to construct. Basically, I need to count in any month with at least one enrollment day, but exclude the month a person dies and any subsequent months. Generally the enrollment days are consistent with decease date, however, there could be some data entry error due to time lag, such as Person20.
Any suggestion would be greatly appreciate!
Thank you!
Lichee
Wouldn't person 20 have 5 enrolled months because their deceased date is June 7, 2018?
Let's talk about Maxim 19. Long data sets are easier to work with and easier to program than wide data sets, where month information are included in the variable name. So first I create a long data set via PROC TRANSPOSE. Then counting is done via simple IF statements and you can then use PROC SUMMARY to add up the number of months enrolled. (Lesson: don't create wide data sets with calendar information in the variable name. Assuming you create the data as a long data set, there is no need for PROC TRANSPOSE and the month information is a variable in your data set, and this is much easier to work with.)
proc transpose data=have(drop=enrolled_months) out=have1 prefix=enrolledDays;
by dummyid notsorted decease_date;
var enrolled:;
run;
data have2;
set have1;
month_enrolled=input(scan(_name_,2,'_'),yymmn6.);
if missing(decease_date) and enrolleddays1>0 then enrolled=1;
else if not missing(decease_date) and month(decease_date)>month(month_enrolled) and enrolleddays1>0 then enrolled=1;
else enrolled=0;
format month_enrolled yymms7.;
drop _name_;
run;
proc summary data=have2 nway;
class dummyid;
var enrolled;
output out=want sum=enrolled_months;
run;
Also, the above would need to be modified if the data collected crossed year boundaries, but you didn't give us data that exists in more than one years, so the above does work.
Many of us will not download Excel files or other Microsoft Office files as they are a security risk. Please provide (a portion of) your data as working SAS data step code, which you can type in yourself, or follow these instructions, and include this code in your reply.
Thanks for pointing that out! Here is the code for the example data. Again, the last column Enrolled_Months is the target to construct. Thank you!
data have;
infile datalines truncover dsd;
input DummyID $ Decease_Date :mmddyy. EnrolledDays_201801 EnrolledDays_201802 EnrolledDays_201803 EnrolledDays_201804 EnrolledDays_201805 EnrolledDays_201806 EnrolledDays_201807
EnrolledDays_201808 EnrolledDays_201809 EnrolledDays_201810 EnrolledDays_201811 EnrolledDays_201812 Enrolled_Months
;
format Decease_Date date9.;
datalines;
Person1,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person2,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person3,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person4,2/25/2018,31,25,0,0,0,0,0,0,0,0,0,0,1
Person5,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person6,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person7,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person8,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person9,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person10,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person11,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person12,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person13,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person14,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person15,10/5/2018,0,0,14,30,31,30,31,31,30,5,0,0,7
Person16,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person17,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person18,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person19,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person20,6/7/2018,31,28,31,30,31,30,7,0,0,0,0,0,6
;
run;
Is your last record correct, or should it be 5?
Person20,6/7/2018,31,28,31,30,31,30,7,0,0,0,0,0,6
The person dies in June so according to the criteria originally stated:
Basically, I need to count in any month with at least one enrollment day, but exclude the month a person dies and any subsequent months. Generally the enrollment days are consistent with decease date, however, there could be some data entry error due to time lag, such as Person20.
If 5 is correct for the last record this should work:
data want;
set have;
array _enr(12) EnrolledDays_2018:;
if not missing(decease_date) then death_month = month(decease_date);
enrolled_months_calc =0;
do i=1 to min(death_month-1, 12);
if _enr(i) ne 0 then enrolled_months_calc+1;
end;
check= enrolled_months - enrolled_months_calc;
run;
@lichee wrote:
Thanks for pointing that out! Here is the code for the example data. Again, the last column Enrolled_Months is the target to construct. Thank you!
data have;
infile datalines truncover dsd;
input DummyID $ Decease_Date :mmddyy. EnrolledDays_201801 EnrolledDays_201802 EnrolledDays_201803 EnrolledDays_201804 EnrolledDays_201805 EnrolledDays_201806 EnrolledDays_201807
EnrolledDays_201808 EnrolledDays_201809 EnrolledDays_201810 EnrolledDays_201811 EnrolledDays_201812 Enrolled_Months
;
format Decease_Date date9.;
datalines;
Person1,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person2,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person3,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person4,2/25/2018,31,25,0,0,0,0,0,0,0,0,0,0,1
Person5,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person6,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person7,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person8,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person9,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person10,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person11,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person12,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person13,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person14,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person15,10/5/2018,0,0,14,30,31,30,31,31,30,5,0,0,7
Person16,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person17,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person18,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person19,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person20,6/7/2018,31,28,31,30,31,30,7,0,0,0,0,0,6
;
run;
Wouldn't person 20 have 5 enrolled months because their deceased date is June 7, 2018?
Let's talk about Maxim 19. Long data sets are easier to work with and easier to program than wide data sets, where month information are included in the variable name. So first I create a long data set via PROC TRANSPOSE. Then counting is done via simple IF statements and you can then use PROC SUMMARY to add up the number of months enrolled. (Lesson: don't create wide data sets with calendar information in the variable name. Assuming you create the data as a long data set, there is no need for PROC TRANSPOSE and the month information is a variable in your data set, and this is much easier to work with.)
proc transpose data=have(drop=enrolled_months) out=have1 prefix=enrolledDays;
by dummyid notsorted decease_date;
var enrolled:;
run;
data have2;
set have1;
month_enrolled=input(scan(_name_,2,'_'),yymmn6.);
if missing(decease_date) and enrolleddays1>0 then enrolled=1;
else if not missing(decease_date) and month(decease_date)>month(month_enrolled) and enrolleddays1>0 then enrolled=1;
else enrolled=0;
format month_enrolled yymms7.;
drop _name_;
run;
proc summary data=have2 nway;
class dummyid;
var enrolled;
output out=want sum=enrolled_months;
run;
Also, the above would need to be modified if the data collected crossed year boundaries, but you didn't give us data that exists in more than one years, so the above does work.
Actually, I do want to explore and learn the code for the data crossing year boundaries. So if I have the data below, how would the code look like? Thanks again!
data have;
infile datalines truncover dsd;
input DummyID $ Decease_Date :mmddyy. EnrolledDays_201707 EnrolledDays_201708 EnrolledDays_201709 EnrolledDays_201710
EnrolledDays_201711 EnrolledDays_201712 EnrolledDays_201801 EnrolledDays_201802 EnrolledDays_201803 EnrolledDays_201804
EnrolledDays_201805 EnrolledDays_201806
Enrolled_Months
;
format Decease_Date date9.;
datalines;
Person1,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person2,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person3,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person4,2/25/2018,0,0,0,0,0,0,31,25,0,0,0,0,1
Person5,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person6,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person7,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person8,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person9,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person10,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person11,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person12,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person13,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person14,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person15,10/5/2018,31,31,30,31,30,31,31,28,31,30,31,30,12
Person16,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person17,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person18,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person19,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person20,6/7/2018,0,0,0,0,0,0,31,28,31,30,31,30,5
;
run;
I believe the only change would be to this line
else if not missing(decease_date) and month(decease_date)>month(month_enrolled) and enrolleddays1>0 then enrolled=1;
I think this following code works, but I will let you do the testing:
else if not missing(decease_date) and intnx('month',decease_date,0,'b')>month_enrolled and enrolleddays1>0 then enrolled=1;
This should work because we are using actual SAS date values and not character strings or variable names to represent the month_enrolled. Since you're really just looking at months, we use the first day of each month to do the comparisons of decease_date to month_enrolled.
As stated above, I think you would be wise to not create wide data sets with calendar information in the variable name, this is not a good format for your data set. Wide data sets like the one you have means that every month, you have to modify your code to accommodate the new month. Having a long data set, with a variable named month_enrolled, is a better approach in the long run, no modification to the code is needed as the months change.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.