I have dataset which has 3 variables which includes ID, date and enrol, and it is sorted by ID and date. My goal is to create a new dataset which has another variable say enrol_flag which should be 1 once the enrol is 1 but it should become 0 once enrol is 0 and retains that 0 value.
My dataset looks like:
ID Date enrol
1 01JAN21 0
1 01FEB21 1
1 01MAR21 1
1 01APR21 1
1 01MAY21 0
1 01JUN21 0
1 01JUL21 1
1 01AUG21 1
2 01JAN21 1
2 01FEB21 1
2 01MAR21 1
2 01APR21 1
2 01MAY21 1
2 01JUN21 0
2 01JUL21 0
2 01AUG21 0
3 01FEB21 0
3 01MAR21 0
3 01APR21 0
3 01MAY21 0
3 01JUN21 0
3 01JUL21 1
Output I want
ID Date enrol enrol_flag
1 01JAN21 0 0
1 01FEB21 1 1
1 01MAR21 1 1
1 01APR21 1 1
1 01MAY21 0 0
1 01JUN21 0 0
1 01JUL21 1 0
1 01AUG21 1 0
2 01JAN21 1 1
2 01FEB21 1 1
2 01MAR21 1 1
2 01APR21 1 1
2 01MAY21 1 1
2 01JUN21 0 0
2 01JUL21 0 0
2 01AUG21 0 0
3 01FEB21 0 0
3 01MAR21 0 0
3 01APR21 0 0
3 01MAY21 0 0
3 01JUN21 0 0
3 01JUL21 1 1
Thank you in advance for your help.
Just count how many enrollments each ID has had. Then it is easy to create the new_flag (if you need it).
First let's convert your listing back into an actual dataset.
data have;
input ID Date :date. enrol;
format date date9.;
cards;
1 01JAN2021 0
1 01FEB2021 1
1 01MAR2021 1
1 01APR2021 1
1 01MAY2021 0
1 01JUN2021 0
1 01JUL2021 1
1 01AUG2021 1
2 01JAN2021 1
2 01FEB2021 1
2 01MAR2021 1
2 01APR2021 1
2 01MAY2021 1
2 01JUN2021 0
2 01JUL2021 0
2 01AUG2021 0
3 01FEB2021 0
3 01MAR2021 0
3 01APR2021 0
3 01MAY2021 0
3 01JUN2021 0
3 01JUL2021 1
;
Now just use the NOTSORTED keyword on a BY statement to able to find when a new enrollment starts.
data want;
set have;
by id enrol notsorted;
if first.id then period=0;
if enrol and first.enrol then period+1;
enrol_flag = enrol and period=1;
run;
Result
enrol_ OBS ID Date enrol period flag 1 1 01JAN2021 0 0 0 2 1 01FEB2021 1 1 1 3 1 01MAR2021 1 1 1 4 1 01APR2021 1 1 1 5 1 01MAY2021 0 1 0 6 1 01JUN2021 0 1 0 7 1 01JUL2021 1 2 0 8 1 01AUG2021 1 2 0 9 2 01JAN2021 1 1 1 10 2 01FEB2021 1 1 1 11 2 01MAR2021 1 1 1 12 2 01APR2021 1 1 1 13 2 01MAY2021 1 1 1 14 2 01JUN2021 0 1 0 15 2 01JUL2021 0 1 0 16 2 01AUG2021 0 1 0 17 3 01FEB2021 0 0 0 18 3 01MAR2021 0 0 0 19 3 01APR2021 0 0 0 20 3 01MAY2021 0 0 0 21 3 01JUN2021 0 0 0 22 3 01JUL2021 1 1 1
Just count how many enrollments each ID has had. Then it is easy to create the new_flag (if you need it).
First let's convert your listing back into an actual dataset.
data have;
input ID Date :date. enrol;
format date date9.;
cards;
1 01JAN2021 0
1 01FEB2021 1
1 01MAR2021 1
1 01APR2021 1
1 01MAY2021 0
1 01JUN2021 0
1 01JUL2021 1
1 01AUG2021 1
2 01JAN2021 1
2 01FEB2021 1
2 01MAR2021 1
2 01APR2021 1
2 01MAY2021 1
2 01JUN2021 0
2 01JUL2021 0
2 01AUG2021 0
3 01FEB2021 0
3 01MAR2021 0
3 01APR2021 0
3 01MAY2021 0
3 01JUN2021 0
3 01JUL2021 1
;
Now just use the NOTSORTED keyword on a BY statement to able to find when a new enrollment starts.
data want;
set have;
by id enrol notsorted;
if first.id then period=0;
if enrol and first.enrol then period+1;
enrol_flag = enrol and period=1;
run;
Result
enrol_ OBS ID Date enrol period flag 1 1 01JAN2021 0 0 0 2 1 01FEB2021 1 1 1 3 1 01MAR2021 1 1 1 4 1 01APR2021 1 1 1 5 1 01MAY2021 0 1 0 6 1 01JUN2021 0 1 0 7 1 01JUL2021 1 2 0 8 1 01AUG2021 1 2 0 9 2 01JAN2021 1 1 1 10 2 01FEB2021 1 1 1 11 2 01MAR2021 1 1 1 12 2 01APR2021 1 1 1 13 2 01MAY2021 1 1 1 14 2 01JUN2021 0 1 0 15 2 01JUL2021 0 1 0 16 2 01AUG2021 0 1 0 17 3 01FEB2021 0 0 0 18 3 01MAR2021 0 0 0 19 3 01APR2021 0 0 0 20 3 01MAY2021 0 0 0 21 3 01JUN2021 0 0 0 22 3 01JUL2021 1 1 1
Thanks a lot. It worked
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.