BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lichee
Quartz | Level 8

Hi All,

I am trying to count the number of enrollment months based on enrollment days and deceased dates. The example data is as attached, and the last column Enrolled_Months is the target field to construct. Basically, I need to count in any month with at least one enrollment day, but exclude the month a person dies and any subsequent months. Generally the enrollment days are consistent with decease date, however, there could be some data entry error due to time lag, such as Person20.

Any suggestion would be greatly appreciate!

 

Thank you!

Lichee

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Wouldn't person 20 have 5 enrolled months because their deceased date is June 7, 2018?

 

Let's talk about Maxim 19. Long data sets are easier to work with and easier to program than wide data sets, where month information are included in the variable name. So first I create a long data set via PROC TRANSPOSE. Then counting is done via simple IF statements and you can then use PROC SUMMARY to add up the number of months enrolled. (Lesson: don't create wide data sets with calendar information in the variable name. Assuming you create the data as a long data set, there is no need for PROC TRANSPOSE and the month information is a variable in your data set, and this is much easier to work with.)

 

proc transpose data=have(drop=enrolled_months) out=have1 prefix=enrolledDays;
    by dummyid notsorted decease_date;
    var enrolled:;
run;
data have2;
    set have1;
    month_enrolled=input(scan(_name_,2,'_'),yymmn6.);
    if missing(decease_date) and enrolleddays1>0 then enrolled=1;
    else if not missing(decease_date) and month(decease_date)>month(month_enrolled) and enrolleddays1>0 then enrolled=1;
    else enrolled=0;
    format month_enrolled yymms7.;
    drop _name_;
run;
proc summary data=have2 nway;
    class dummyid;
    var enrolled;
    output out=want sum=enrolled_months;
run;

 

Also, the above would need to be modified if the data collected crossed year boundaries, but you didn't give us data that exists in more than one years, so the above does work.

--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Many of us will not download Excel files or other Microsoft Office files as they are a security risk. Please provide (a portion of) your data as working SAS data step code, which you can type in yourself, or follow these instructions, and include this code in your reply.

--
Paige Miller
lichee
Quartz | Level 8

Thanks for pointing that out! Here is the code for the example data. Again, the last column Enrolled_Months is the target to construct. Thank you!

data have;
infile datalines truncover dsd;
input DummyID $ Decease_Date :mmddyy. EnrolledDays_201801 EnrolledDays_201802 EnrolledDays_201803 EnrolledDays_201804 EnrolledDays_201805 EnrolledDays_201806 EnrolledDays_201807
EnrolledDays_201808 EnrolledDays_201809 EnrolledDays_201810 EnrolledDays_201811 EnrolledDays_201812 Enrolled_Months
;
format Decease_Date date9.;
datalines;
Person1,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person2,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person3,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person4,2/25/2018,31,25,0,0,0,0,0,0,0,0,0,0,1
Person5,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person6,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person7,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person8,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person9,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person10,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person11,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person12,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person13,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person14,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person15,10/5/2018,0,0,14,30,31,30,31,31,30,5,0,0,7
Person16,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person17,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person18,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person19,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person20,6/7/2018,31,28,31,30,31,30,7,0,0,0,0,0,6
;
run;

Reeza
Super User

Is your last record correct, or should it be 5?

 

 

Person20,6/7/2018,31,28,31,30,31,30,7,0,0,0,0,0,6

The person dies in June so according to the criteria originally stated:

 

Basically, I need to count in any month with at least one enrollment day, but exclude the month a person dies and any subsequent months. Generally the enrollment days are consistent with decease date, however, there could be some data entry error due to time lag, such as Person20.

 

If 5 is correct for the last record this should work:

data want;
set have;
array _enr(12) EnrolledDays_2018:;
if not missing(decease_date) then death_month = month(decease_date);

enrolled_months_calc =0;

do i=1 to min(death_month-1, 12);
if _enr(i) ne 0 then enrolled_months_calc+1;
end;

check= enrolled_months - enrolled_months_calc;
run;

@lichee wrote:

Thanks for pointing that out! Here is the code for the example data. Again, the last column Enrolled_Months is the target to construct. Thank you!

data have;
infile datalines truncover dsd;
input DummyID $ Decease_Date :mmddyy. EnrolledDays_201801 EnrolledDays_201802 EnrolledDays_201803 EnrolledDays_201804 EnrolledDays_201805 EnrolledDays_201806 EnrolledDays_201807
EnrolledDays_201808 EnrolledDays_201809 EnrolledDays_201810 EnrolledDays_201811 EnrolledDays_201812 Enrolled_Months
;
format Decease_Date date9.;
datalines;
Person1,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person2,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person3,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person4,2/25/2018,31,25,0,0,0,0,0,0,0,0,0,0,1
Person5,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person6,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person7,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person8,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person9,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person10,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person11,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person12,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person13,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person14,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person15,10/5/2018,0,0,14,30,31,30,31,31,30,5,0,0,7
Person16,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person17,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person18,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person19,,31,28,31,30,31,30,31,31,30,31,30,31,12
Person20,6/7/2018,31,28,31,30,31,30,7,0,0,0,0,0,6
;
run;


 

lichee
Quartz | Level 8
You are right. The last one should have Enrolled_Months=5. Thanks a lot!
PaigeMiller
Diamond | Level 26

Wouldn't person 20 have 5 enrolled months because their deceased date is June 7, 2018?

 

Let's talk about Maxim 19. Long data sets are easier to work with and easier to program than wide data sets, where month information are included in the variable name. So first I create a long data set via PROC TRANSPOSE. Then counting is done via simple IF statements and you can then use PROC SUMMARY to add up the number of months enrolled. (Lesson: don't create wide data sets with calendar information in the variable name. Assuming you create the data as a long data set, there is no need for PROC TRANSPOSE and the month information is a variable in your data set, and this is much easier to work with.)

 

proc transpose data=have(drop=enrolled_months) out=have1 prefix=enrolledDays;
    by dummyid notsorted decease_date;
    var enrolled:;
run;
data have2;
    set have1;
    month_enrolled=input(scan(_name_,2,'_'),yymmn6.);
    if missing(decease_date) and enrolleddays1>0 then enrolled=1;
    else if not missing(decease_date) and month(decease_date)>month(month_enrolled) and enrolleddays1>0 then enrolled=1;
    else enrolled=0;
    format month_enrolled yymms7.;
    drop _name_;
run;
proc summary data=have2 nway;
    class dummyid;
    var enrolled;
    output out=want sum=enrolled_months;
run;

 

Also, the above would need to be modified if the data collected crossed year boundaries, but you didn't give us data that exists in more than one years, so the above does work.

--
Paige Miller
lichee
Quartz | Level 8

Actually, I do want to explore and learn the code for the data crossing year boundaries. So if I have the data below, how would the code look like? Thanks again!

 

data have;
infile datalines truncover dsd;
input DummyID $ Decease_Date :mmddyy. EnrolledDays_201707 EnrolledDays_201708 EnrolledDays_201709 EnrolledDays_201710
EnrolledDays_201711 EnrolledDays_201712 EnrolledDays_201801 EnrolledDays_201802 EnrolledDays_201803 EnrolledDays_201804
EnrolledDays_201805 EnrolledDays_201806
Enrolled_Months
;
format Decease_Date date9.;
datalines;
Person1,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person2,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person3,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person4,2/25/2018,0,0,0,0,0,0,31,25,0,0,0,0,1
Person5,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person6,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person7,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person8,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person9,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person10,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person11,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person12,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person13,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person14,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person15,10/5/2018,31,31,30,31,30,31,31,28,31,30,31,30,12
Person16,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person17,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person18,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person19,,31,31,30,31,30,31,31,28,31,30,31,30,12
Person20,6/7/2018,0,0,0,0,0,0,31,28,31,30,31,30,5
;
run;

PaigeMiller
Diamond | Level 26

I believe the only change would be to this line

 

else if not missing(decease_date) and month(decease_date)>month(month_enrolled) and enrolleddays1>0 then enrolled=1;

I think this following code works, but I will let you do the testing:

 

else if not missing(decease_date) and intnx('month',decease_date,0,'b')>month_enrolled and enrolleddays1>0 then enrolled=1;

 

 

This should work because we are using actual SAS date values and not character strings or variable names to represent the month_enrolled. Since you're really just looking at months, we use the first day of each month to do the comparisons of decease_date to month_enrolled.

 

As stated above, I think you would be wise to not create wide data sets with calendar information in the variable name, this is not a good format for your data set. Wide data sets like the one you have means that every month, you have to modify your code to accommodate the new month. Having a long data set, with a variable named month_enrolled, is a better approach in the long run, no modification to the code is needed as the months change.  

--
Paige Miller

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1892 views
  • 3 likes
  • 3 in conversation