Hi,
Really annoyed with myself that I can't figure this out (or find the answer online). I have a dataset that covers the last four years (a single month end entry for each account) of accounts moving through arrears cycles and I am trying to identify the first and last occurrence of each account being at each level of arrears cycle so I can calculate how long an account has spent at a particular level of arrears cycles.
I have sorted the dataset by account_no, month and cycle and thought I would be able to use a by statement in my data step to say
by account_no date cycle and then say if first.cycle or last.cycle then cycle_flag = 1. When I've run this, every entry has a cycle_flag of 1 which I assume is because each month entry is unique so it is the first. combination of the three variables. If I try to say just by account_no and cycle, then I get an error message saying the by variables are not properly sorted.
proc sort data=all_roller;
by account_no month current_cycle;
run;
data all_roller_1;
set all_roller;
by account_no month current_cycle;
if first.current_cycle or last.current_cycle then cyc_flag = 1;
else cyc_flag = 0;
run;
I do need the data to be sorted by date so am looking for help in how to pull out the first and last entries of each cycle amount. If an account has been at cycle level 3 then goes to 4 and then back to 3 at a later date, I would be looking to flag up the cycle level three on more than one occasion.
As always, thanks in advance for any assistance,
Rob
If you want to use FIRST./LAST. processing on data that is not sorted then add the NOTSORTED keyword to the BY statement.
This will treat the data as grouped, but not check if the groups are ordered.
data all_roller_1;
set all_roller;
by account_no current_cycle notsorted;
if first.current_cycle or last.current_cycle then cyc_flag = 1;
else cyc_flag = 0;
run;
Note that if the group only has only one or two observations then all of the observations in that group will have CYR_FLAG=1. When the group has more than two observations then the "middle" observations will have CYR_FLAG=0.
Please show us a representative portion of your data, as SAS data step code (and not in some other format).
Apologies Paige,
so in the example below, I'd be looking for the flag variable to be created for JAN2015 (first), MAR2015 (last), APR2015(first & last), MAY2015 (first),DEC2016 (last), JAN2017(first & last)
data WORK.ALL_ROLLER_1;
infile datalines dsd truncover;
input month:MONYY7. account_no:32. current_cycle:32.;
format month MONYY7.;
datalines;
JAN2015 8552 1
FEB2015 8552 1
MAR2015 8552 1
APR2015 8552 0
JUL2015 8552 1
AUG2015 8552 1
SEP2015 8552 1
OCT2015 8552 1
NOV2015 8552 1
DEC2015 8552 1
JAN2016 8552 1
FEB2016 8552 1
MAR2016 8552 1
APR2016 8552 1
MAY2016 8552 1
JUN2016 8552 1
JUL2016 8552 1
AUG2016 8552 1
SEP2016 8552 1
OCT2016 8552 1
NOV2016 8552 1
DEC2016 8552 1
JAN2017 8552 0
;;;;
If you want to use FIRST./LAST. processing on data that is not sorted then add the NOTSORTED keyword to the BY statement.
This will treat the data as grouped, but not check if the groups are ordered.
data all_roller_1;
set all_roller;
by account_no current_cycle notsorted;
if first.current_cycle or last.current_cycle then cyc_flag = 1;
else cyc_flag = 0;
run;
Note that if the group only has only one or two observations then all of the observations in that group will have CYR_FLAG=1. When the group has more than two observations then the "middle" observations will have CYR_FLAG=0.
That's perfect, thanks Tom.
I can't believe I've never come across notsorted before, definitely will be a lot of use to me.
Thanks again
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.