Good evening from England! I am using DI Studio and I am facing the following issue. I have a table named TRANSACTIONS that shows all the transactions of several bank accounts within 2021. For simplicity, let's assume I have only one account and its transactions throughout January 2021. Assume also that the account holder performed transactions only on 3rd, 10th, 17th and 22nd of January. The TRANSACTIONS table is the following:
date | account_n | start_balance | trans_cr | trans_dr |
01/01/2021 | 123456 | 300 | . | . |
02/01/2021 | 123456 | . | . | . |
03/01/2021 | 123456 | . | 200 | 90 |
04/01/2021 | 123456 | . | . | . |
05/01/2021 | 123456 | . | . | . |
06/01/2021 | 123456 | . | . | . |
07/01/2021 | 123456 | . | . | . |
08/01/2021 | 123456 | . | . | . |
09/01/2021 | 123456 | . | . | . |
10/01/2021 | 123456 | . | 30 | 10 |
11/01/2021 | 123456 | . | . | . |
12/01/2021 | 123456 | . | . | . |
13/01/2021 | 123456 | . | . | . |
14/01/2021 | 123456 | . | . | . |
15/01/2021 | 123456 | . | . | . |
16/01/2021 | 123456 | . | . | . |
17/01/2021 | 123456 | . | 30 | 70 |
18/01/2021 | 123456 | . | . | . |
19/01/2021 | 123456 | . | . | . |
20/01/2021 | 123456 | . | . | . |
21/01/2021 | 123456 | . | . | . |
22/01/2021 | 123456 | . | 20 | 10 |
23/01/2021 | 123456 | . | . | . |
24/01/2021 | 123456 | . | . | . |
25/01/2021 | 123456 | . | . | . |
26/01/2021 | 123456 | . | . | . |
27/01/2021 | 123456 | . | . | . |
28/01/2021 | 123456 | . | . | . |
29/01/2021 | 123456 | . | . | . |
30/01/2021 | 123456 | . | . | . |
31/01/2021 | 123456 | . | . | . |
Where date: the date of transaction in DDMMYY10. format
account_n: the account number
start_balance: the starting balance in the beginning of the year, i.e. 01/01/2021
trans_cr: the total credit transactions of the day
trans_dr: the total debit transactions of the day
From the TRANSACTIONS table above I want to create the following table, named CALC_BALS:
date | account_n | endday_balance | trans_cr | trans_dr |
01/01/2021 | 123456 | 300 | 0 | 0 |
02/01/2021 | 123456 | 300 | 0 | 0 |
03/01/2021 | 123456 | 410 | 200 | 90 |
04/01/2021 | 123456 | 410 | 0 | 0 |
05/01/2021 | 123456 | 410 | 0 | 0 |
06/01/2021 | 123456 | 410 | 0 | 0 |
07/01/2021 | 123456 | 410 | 0 | 0 |
08/01/2021 | 123456 | 410 | 0 | 0 |
09/01/2021 | 123456 | 410 | 0 | 0 |
10/01/2021 | 123456 | 430 | 30 | 10 |
11/01/2021 | 123456 | 430 | 0 | 0 |
12/01/2021 | 123456 | 430 | 0 | 0 |
13/01/2021 | 123456 | 430 | 0 | 0 |
14/01/2021 | 123456 | 430 | 0 | 0 |
15/01/2021 | 123456 | 430 | 0 | 0 |
16/01/2021 | 123456 | 430 | 0 | 0 |
17/01/2021 | 123456 | 390 | 30 | 70 |
18/01/2021 | 123456 | 390 | 0 | 0 |
19/01/2021 | 123456 | 390 | 0 | 0 |
20/01/2021 | 123456 | 390 | 0 | 0 |
21/01/2021 | 123456 | 390 | 0 | 0 |
22/01/2021 | 123456 | 400 | 20 | 10 |
23/01/2021 | 123456 | 400 | 0 | 0 |
24/01/2021 | 123456 | 400 | 0 | 0 |
25/01/2021 | 123456 | 400 | 0 | 0 |
26/01/2021 | 123456 | 400 | 0 | 0 |
27/01/2021 | 123456 | 400 | 0 | 0 |
28/01/2021 | 123456 | 400 | 0 | 0 |
29/01/2021 | 123456 | 400 | 0 | 0 |
30/01/2021 | 123456 | 400 | 0 | 0 |
31/01/2021 | 123456 | 400 | 0 | 0 |
The created variable here is endday_balance, which is the balance at the end of the day, and it is defined as:
start of day balance (ie previous endday_balance)+ trans_cr - trans_dr
As it is shown, I want to fill in the missing values of endday_balance with its latest value. I am struggling with these 2 problems of filling values and using a lagged value. Do you have any idea of how to do it? I would prefer to use DI transformations, but if not possible, I can use manual code. Thank you in advance.
data want;
set have;
if not missing(start_balance) or not missing(trans_cr) or not missing(trans_dr) then
end_of_day_balance + sum(start_balance,trans_cr,-trans_dr);
if missing(trans_cr) then trans_cr=0;
if missing(trans_dr) then trans_dr=0;
run;
This code is untested. If you want tested code, please provide data as working SAS data step code (examples and instructions) and NOT as Excel files and NOT as copy and paste from Excel.
You said: "For simplicity, let's assume I have only one account". This is a poor assumption, and the above code will not work if you have more than one account. In general, if the real life problem has more than 1 account, please provide data for at least two accounts.
Modification for more than 1 account (again, untested)
data want;
set have;
by account_n;
if first.account_n then end_of_day_balance=0;
if not missing(start_balance) or not missing(trans_cr) or not missing(trans_dr) then
end_of_day_balance + sum(start_balance,trans_cr,-trans_dr);
if missing(trans_cr) then trans_cr=0;
if missing(trans_dr) then trans_dr=0;
run;
data want;
set have;
if not missing(start_balance) or not missing(trans_cr) or not missing(trans_dr) then
end_of_day_balance + sum(start_balance,trans_cr,-trans_dr);
if missing(trans_cr) then trans_cr=0;
if missing(trans_dr) then trans_dr=0;
run;
This code is untested. If you want tested code, please provide data as working SAS data step code (examples and instructions) and NOT as Excel files and NOT as copy and paste from Excel.
You said: "For simplicity, let's assume I have only one account". This is a poor assumption, and the above code will not work if you have more than one account. In general, if the real life problem has more than 1 account, please provide data for at least two accounts.
Modification for more than 1 account (again, untested)
data want;
set have;
by account_n;
if first.account_n then end_of_day_balance=0;
if not missing(start_balance) or not missing(trans_cr) or not missing(trans_dr) then
end_of_day_balance + sum(start_balance,trans_cr,-trans_dr);
if missing(trans_cr) then trans_cr=0;
if missing(trans_dr) then trans_dr=0;
run;
Thanks for the advice! I will test the code!
Thanks! I will try the code!
Untested, in the absence of sample data in the form of a working data step:
data want;
set have;
by account_n;
trans_cr=coalesce(trans_cr,0);
trans_dr=coalesce(trans_dr,0);
retain endday_balance;
if first.account_n then endday_balance=start_balance;
else endday_balance + trans_cr +(-1*trans_dr);
run;
The above assumes data are sorted by account_n/date, and that the first obs for each account has a valid start_balance value, but no values for trans_cr and trans_dr. Also no other obs has a valid start_balance value.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.