Hi,
I'm trying to calculate a value (final_bal) using a sorted dataset I have. The dataset exposure has a structure like the following:
Accno Date Def_type Adj_bal
1001 2008/01 Loss $3500
1001 2008/06 Loss $1100
1001 2008/09 Fee $260
1002 2008/03 Fee $2200
1002 2008/09 Loss $800
1002 2009/01 Fee $260
1003 2016/11 Fee $15000
1003 2017/01 Loss 8200
1003 2017/03 Loss 40
The dataset is sorted by account-number, then by date. What I'm to do is to come up with a new column that's called final_bal, and it will be the adj_bal value dependent on the first Def_type is. Here is the logic I'm trying to implement into my code:
If the first Def_type of the account is called 'Loss', then I'll pick the value of that date (ex. $3500 for account 1001) regardless what status the later dates have. However if the first value of the account is called 'Fee', then I'll pick the last value (ex. $40 for account 1003) regardless what status the later dates have. How do I get this done? I tried following approaches:
data select;
set exposure;
retain final_bal;
if first.accno then do;
if def_type = 'Loss' then final_bal = first.Adj_Bal;
else if def_type = 'Fee' then final_bal = last.Adj_Bal;
end;
run;
That didn't work. Also I tried something like the following but it doesn't work (desperate attempt) the logic also failed.
data test (keep = accno date def_type Adj_Bal);
set exposure;
by accno date;
if first.accno or last.accno;
run;
data test2;
set test;
retain EAD;
Bal_prev = lag(Adj_Bal);
Type_prev = lag(def_type);
if first.accno and def_type = 'Loss'
then final_bal = Adj_Bal;
if not first.accno and Type_prev = 'Loss'
then final_bal = Bal_prev;
else if not first.accno and Type_prev = 'Fee'
then final_bal = Adj_Bal;
else if first.accno and def_type = 'Fee'
then final_bal = .;
run;
What am I doing wrong? Most importantly, what's the best way to get this done, whether in a data step or proc sql?
I hope I made this explanation clear. Thank you for your contribution to this question I have.
I would do this:
data HAVE;
input ACCNO DATE $ DEF_TYPE $ ADJ_BAL dollar8.;
cards;
1001 2008/01 Loss $3500
1001 2008/06 Loss $1100
1001 2008/09 Fee $260
1002 2008/03 Fee $2200
1002 2008/09 Loss $800
1002 2009/01 Fee $260
1003 2016/11 Fee $15000
1003 2017/01 Loss 8200
1003 2017/03 Loss 40
;
data want;
do until (last.accno);
set have; by accno;
if first.accno then first_type = def_type;
if first_type = "Loss" and first.accno then final_bal = adj_bal;
if first_type = "Fee" and last.accno then final_bal = adj_bal;
end;
do until (last.accno);
set have; by accno;
output;
end;
drop first_type;
run;
I suspect that's not what you want, but since you haven't given the output you want, here is what you asked for.
data HAVE;
input ACCNO DATE $ DEF_TYPE $ ADJ_BAL dollar8.;
cards;
1001 2008/01 Loss $3500
1001 2008/06 Loss $1100
1001 2008/09 Fee $260
1002 2008/03 Fee $2200
1002 2008/09 Loss $800
1002 2009/01 Fee $260
1003 2016/11 Fee $15000
1003 2017/01 Loss 8200
1003 2017/03 Loss 40
run;
data WANT;
set HAVE;
by ACCNO;
retain FINAL_BAL;
if first.ACCNO then FINAL_BAL=ifn(DEF_TYPE='Loss', ADJ_BAL,.);
if last.ACCNO & FINAL_BAL=. then FINAL_BAL=ADJ_BAL;
run;
ACCNO | DATE | DEF_TYPE | ADJ_BAL | FINAL_BAL |
1001 | 2008/01 | Loss | 3500 | 3500 |
1001 | 2008/06 | Loss | 1100 | 3500 |
1001 | 2008/09 | Fee | 260 | 3500 |
1002 | 2008/03 | Fee | 2200 | . |
1002 | 2008/09 | Loss | 800 | . |
1002 | 2009/01 | Fee | 260 | 260 |
1003 | 2016/11 | Fee | 1500 | . |
1003 | 2017/01 | Loss | 8200 | . |
1003 | 2017/03 | Loss | 40 | 40 |
To use first.<variable> or last.<variable> yo need a BY statement.
The outcome value of either first.<var> or last.<var> is either 1 or 0, depending on the observation in group,
it does not contain the value of the variable.
In your code you defined retain EAD statement but never used it.
What was your intention by it ? Didn't you got EAD missing in all observations?
In the 2nd step there are at most 2 obs per ACCNO, so any obs may be either first.accno or last.accno or both at same time
in case there is only 1 obs per accno.
data test (keep = accno date def_type Adj_Bal);
set exposure;
by accno date;
if first.accno or last.accno;
run;
/* the outcome of 1st step is the most 2 obs per accno */
data test2;
set test;
by accno; /* <<< line added */
retain EAD;
Bal_prev = lag(Adj_Bal);
Type_prev = lag(def_type);
if first.accno and def_type = 'Loss'
then final_bal = Adj_Bal;
if last.accno and Type_prev = 'Loss'
then final_bal = Bal_prev;
else if last.accno and Type_prev = 'Fee'
then final_bal = Adj_Bal;
else if first.accno and def_type = 'Fee'
then final_bal = .;
run;
/** GROUP BASED ON VALUES **/
data have;
set have;
by accno;
if first.accno and substr(def_type,1,1) eq "L" then grp="F"; /** L denotes Loss**/
else if first.accno and substr(def_type,1,1) eq "F" then grp="L"; /** F detones Fees **/
else grp=" ";
run;
/** LOCF CONCEPT ***/
data have;
set have;
retain locf;/** LAST OBS CARRY FORWARD **/
by accno;
if first.accno then locf=grp;
if locf eq " " then grp=locf;
drop grp;
rename locf=grp;
run;
/** DIVIDED IN TO TWO SEPARATE DATASET FOR FURTHER PROCESS **/
data first last;
set have;
if grp eq "F" then output first;
else output last;
run;
/** AS PER NEED FIRST.ACCNO**/
data first;
set first;
by accno;
if first.accno;
run;
/** AS PER NEED LAST.ACCNO ***/
data last;
set last;
by accno;
if last.accno;
run;
/** SET ALL TOGHER AND FINAL OUTPUT **/
data final;
set first last;
by accno;
run;
@Shmuel wrote:To use first.<variable> or last.<variable> yo need a BY statement.
The outcome value of either first.<var> or last.<var> is either 1 or 0, depending on the observation in group,
it does not contain the value of the variable.
In your code you defined retain EAD statement but never used it.
What was your intention by it ? Didn't you got EAD missing in all observations?
In the 2nd step there are at most 2 obs per ACCNO, so any obs may be either first.accno or last.accno or both at same time
in case there is only 1 obs per accno.
data test (keep = accno date def_type Adj_Bal); set exposure; by accno date; if first.accno or last.accno; run; /* the outcome of 1st step is the most 2 obs per accno */ data test2; set test; by accno; /* <<< line added */ retain EAD; Bal_prev = lag(Adj_Bal); Type_prev = lag(def_type); if first.accno and def_type = 'Loss' then final_bal = Adj_Bal; if last.accno and Type_prev = 'Loss' then final_bal = Bal_prev; else if last.accno and Type_prev = 'Fee' then final_bal = Adj_Bal; else if first.accno and def_type = 'Fee' then final_bal = .; run;
There should many methods do it but this seems fair easy and very quick to understand at any level your experience.
I hope it helps
I would do this:
data HAVE;
input ACCNO DATE $ DEF_TYPE $ ADJ_BAL dollar8.;
cards;
1001 2008/01 Loss $3500
1001 2008/06 Loss $1100
1001 2008/09 Fee $260
1002 2008/03 Fee $2200
1002 2008/09 Loss $800
1002 2009/01 Fee $260
1003 2016/11 Fee $15000
1003 2017/01 Loss 8200
1003 2017/03 Loss 40
;
data want;
do until (last.accno);
set have; by accno;
if first.accno then first_type = def_type;
if first_type = "Loss" and first.accno then final_bal = adj_bal;
if first_type = "Fee" and last.accno then final_bal = adj_bal;
end;
do until (last.accno);
set have; by accno;
output;
end;
drop first_type;
run;
Thank you PGStats! This worked like a charm!
One more follow-up question, just for the sake of my personal understanding, why did you put
set test; by accno;
within the do loop rather than outside of the loop? Also, why divide the work into two loops? In another word, how come combining the two doesn't work?
data test;
do until (last.accno);
set have; by accno;
if first.accno then first_type = def_type;
if first_type = 'Loss' and first.accno then final_bal= adj_bal;
if first_type = 'Waiver' and last.accno then final_bal = adj_bal;
output;
end;
drop first_type;
run;
@Bankshot, The do until(last...) loop is a very useful technique for gathering information about by-groups. If you only need the summary, you can use a single loop. If you want to redistribute the summary information on the original data, you need a second loop, the first one to gather the summary, the second to associate it with the original data.
The set...; by...; statements must reside inside the loop to change the value of the last.xxx variable before it is tested at the end of the loop against the until() condition.
This is very helpful. Thanks again
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.