Hi I would need your help with my loop and lag gap, Loop is something I am not proficient on, so I haven't make my code work for my purpose.
I am trying to count days of enrollment of each person in 2015, and my problem is with those who have more than one row of entries (they dropped and re-enrolled). My continuous enrollment standard is that as long as no single enrollment gap > 45 days, this person is continued enrolled (continue_enroll_flag = 1)
My attempt method:
Loop is something I've always messed up with, so I would like someone to help me with it... (won't post my code because I kinda don't know how to start, I am thinking something like if first.enrollment_start, maybe wrong)
My data looks like this:
person_id | enrollment_start | Enrollment_end |
1 | 1/1/2015 | 3/1/2015 |
1 | 6/2/2015 | 12/31/2015 |
2 | 2/1/2015 | 12/31/2015 |
3 | 1/1/2015 | 12/31/2015 |
4 | 3/1/2015 | 12/31/2015 |
5 | 1/1/2015 | 7/1/2015 |
5 | 8/1/2015 | 12/31/2015 |
6 | 1/1/2015 | 10/25/2015 |
7 | 4/6/2015 | 12/1/2015 |
my desired output looks like this:
person_id | enrollment_start | Enrollment_end | EnrollmentDays | EnrollmentGap | Continue_enroll_Flag |
1 | 1/1/2015 | 3/1/2015 | 59 | 93 | 0 |
1 | 6/2/2015 | 12/31/2015 | 212 | 93 | 0 |
2 | 2/1/2015 | 12/31/2015 | 333 | 32 | 1 |
3 | 1/1/2015 | 12/31/2015 | 364 | 1 | 1 |
4 | 3/1/2015 | 12/31/2015 | 305 | 60 | 0 |
5 | 1/1/2015 | 7/1/2015 | 181 | 31 | 1 |
5 | 8/1/2015 | 12/31/2015 | 152 | 31 | 1 |
6 | 1/1/2015 | 10/25/2015 | 297 | 68 | 0 |
7 | 4/6/2015 | 12/1/2015 | 239 | 126 | 0 |
Appreciate your help!
You'll need multiple steps to do this, not one.
First find the lags, then find the IDs you need to erase. Then select the people you want included.
proc sql;
create table want as
select *
from YourData
where ID not in (Select id from YourData where gap>45);
quit;
A data set implicitly loops so there's no need for loops here at all. Use LAG() to get the previous value when needed.
Thank you @Reeza for your prompt reply, could you specify what do you mean by using lag?
I did this
I've created this multiple_enroll flag, if it ne 1, it means this person has multiple rows
data Step6_I4;
set Step6_I3;
if multiple_enroll ne 1 then
gap = enr_start_date - lag(enr_end_date);
run;
It gives me this:
How can I let SAS only calculate the gap starting from the second row of each person with multiple entries? So it won't have values like -364 shows up (it used person 1's end date and person 2's start date to calculate).
Thank you
LAG() cannot be calculated conditionally, or at least then it doesn't behave the way you expect.
See this example here to get you started:
data have;
informat person_id $1. date_enroll_start date_enroll_end mmddyy10.;
format date_: date9.;
input person_id date_enroll_start date_enroll_end;
cards;
1 1/1/2015 3/1/2015
1 6/2/2015 12/31/2015
2 2/1/2015 12/31/2015
3 1/1/2015 12/31/2015
4 3/1/2015 12/31/2015
5 1/1/2015 7/1/2015
5 8/1/2015 12/31/2015
6 1/1/2015 10/25/2015
7 4/6/2015 12/1/2015
;
run;
data want;
set have;
by person_id;
lag_enroll_end = lag(date_enroll_end);
if first.person_id then gap = 365 - date_enroll_end + date_enroll_start;
else gap = date_enroll_start - lag_enroll_end;
format lag_enroll_end date9.;
run;
Thank you @Reeza. this code creates the exact same problem as my old code gives me, instead of giving a negative values derived from using previous person's enrollment end date, it gives a positive value, but the same thing.
I'm not seeing those numbers based on the sample data provided so can please explain the issue in more detail?
I'm not seeing an issue, so you may need to expand your sample data to better reflect your actual data.
The IF/BY processing prevents some one elses data from rolling over, ie the calculation varies depending on the person's record.
Thank you for asking @Reeza, sorry I should have been more clear on it. You are right, the calculation is based on person_id, so no other person's data is rolled into the calculation.
My problem is that I want to delete a person whose real largest single gap > 45, which means if a person has three rows, all three rows will be deleted. this won't work with this output :
because the first gap number of this person, 214, is not the real gap.
i.e. (I made this up) A person's real largest single gap is 31, this person shouldn't be deleted, how should I manage it?
1/1/2015 | 7/1/2015 | 20089 | 20453 | 31-Dec-15 | 183 |
8/1/2015 | 11/30/2015 | 20089 | 20453 | 31-Dec-15 | 31 |
Is there any way to make something like:
1/1/2015 | 7/1/2015 | 20089 | 20453 | 31-Dec-15 | 31 |
8/1/2015 | 11/30/2015 | 20089 | 20453 | 31-Dec-15 | 31 |
Hope this is clear.
Thank you for constantly helping out on this!
You'll need multiple steps to do this, not one.
First find the lags, then find the IDs you need to erase. Then select the people you want included.
proc sql;
create table want as
select *
from YourData
where ID not in (Select id from YourData where gap>45);
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.