Hi all –
I’m working with data where I know investors' deposits/withdrawals and the total interest they earned over a 90 day period. I want to infer their initial balance - but it's complicated as the interest rate paid depends on the balance in the account the previous day and varies over time.
Specifically –
What I know:
However, the interest rate paid depends on both date and, importantly, the balance in the account
3a. As shown in my code, in days 1-15, the first 30 of deposits earns 5.35% while deposits over 30 earn 3.52%
3b. As shown in my code, in days 16-90, the first 2 of deposits earns 6%, then next 23 earn 4% and anything over 25 earns 3.52%
What I want:
Given the above, I want to infer the "unobserved_initial_balance". Given the structure of interest rates (as a function of the balance each day), I believe the only way to solve is iteratively. I can solve it iteratively with excel solver, but my dataset is large so that's not an option.
I wrote code to demonstrate the problem. My code assumes one knows the initial balance in the account (the variable “unobserved_initial_balance”) so that I can demonstrate how to compute the interest earned each day (as a function of the previous day’s balance and time-varying interest rates). Specifically, my macro computes interest each day as the difference between beginning of day balance "bod_balance" (the balance from the previous end of day + deposit/withdrawals the previous day) and end of day balance ("eod_balance") where the end of day balance is the beginning of day balance plus interest earned that day. Total interest earned is the sum of the daily values (i.e., sum_inferred_interest). Thus, I’d want to minimize the difference between sum_inferrred_interest and total_interest.
A few other thoughts:
I am unfamiliar with proc optmodel and this seems like a relatively complicated case, so any help would be greatly appreciated.
Thanks,
Rick
options nocenter;
proc datasets kill;
data a1; input id $ count date :yymmdd10. deposit; /*this is investor id, a count variable (that goes from 1-90 for each investor), and dates of investors' deposits (positive value) or withdrawals (negative values*/
format date yymmdd10.;
datalines;
A 1 2022-04-15 0.00
A 2 2022-04-16 0.00
A 3 2022-04-17 0.00
A 4 2022-04-18 0.00
A 5 2022-04-19 -3.00
A 6 2022-04-20 0.00
A 7 2022-04-21 0.00
A 8 2022-04-22 0.00
A 9 2022-04-23 0.00
A 10 2022-04-24 0.00
A 11 2022-04-25 0.00
A 12 2022-04-26 0.00
A 13 2022-04-27 0.00
A 14 2022-04-28 0.00
A 15 2022-04-29 0.00
A 16 2022-04-30 0.00
A 17 2022-05-01 0.00
A 18 2022-05-02 0.00
A 19 2022-05-03 0.00
A 20 2022-05-04 0.00
A 21 2022-05-05 0.00
A 22 2022-05-06 0.00
A 23 2022-05-07 0.00
A 24 2022-05-08 0.00
A 25 2022-05-09 0.00
A 26 2022-05-10 0.00
A 27 2022-05-11 0.00
A 28 2022-05-12 0.00
A 29 2022-05-13 0.00
A 30 2022-05-14 0.00
A 31 2022-05-15 0.00
A 32 2022-05-16 0.00
A 33 2022-05-17 0.00
A 34 2022-05-18 0.00
A 35 2022-05-19 0.00
A 36 2022-05-20 0.00
A 37 2022-05-21 0.00
A 38 2022-05-22 0.00
A 39 2022-05-23 0.00
A 40 2022-05-24 4.00
A 41 2022-05-25 0.00
A 42 2022-05-26 0.00
A 43 2022-05-27 0.00
A 44 2022-05-28 0.00
A 45 2022-05-29 0.00
A 46 2022-05-30 0.00
A 47 2022-05-31 0.00
A 48 2022-06-01 0.00
A 49 2022-06-02 0.00
A 50 2022-06-03 0.00
A 51 2022-06-04 0.00
A 52 2022-06-05 0.00
A 53 2022-06-06 0.00
A 54 2022-06-07 0.00
A 55 2022-06-08 0.00
A 56 2022-06-09 0.00
A 57 2022-06-10 0.00
A 58 2022-06-11 0.00
A 59 2022-06-12 0.00
A 60 2022-06-13 1.00
A 61 2022-06-14 0.00
A 62 2022-06-15 0.00
A 63 2022-06-16 0.00
A 64 2022-06-17 0.00
A 65 2022-06-18 0.00
A 66 2022-06-19 0.00
A 67 2022-06-20 0.00
A 68 2022-06-21 0.00
A 69 2022-06-22 0.00
A 70 2022-06-23 0.00
A 71 2022-06-24 0.00
A 72 2022-06-25 0.00
A 73 2022-06-26 0.00
A 74 2022-06-27 0.00
A 75 2022-06-28 0.00
A 76 2022-06-29 0.00
A 77 2022-06-30 0.00
A 78 2022-07-01 0.00
A 79 2022-07-02 0.00
A 80 2022-07-03 0.00
A 81 2022-07-04 0.00
A 82 2022-07-05 0.00
A 83 2022-07-06 0.00
A 84 2022-07-07 0.00
A 85 2022-07-08 0.00
A 86 2022-07-09 0.00
A 87 2022-07-10 0.00
A 88 2022-07-11 0.00
A 89 2022-07-12 0.00
A 90 2022-07-13 0.00
B 1 2022-04-15 0.00
B 2 2022-04-16 0.00
B 3 2022-04-17 0.00
B 4 2022-04-18 0.00
B 5 2022-04-19 -0.20
B 6 2022-04-20 0.00
B 7 2022-04-21 0.00
B 8 2022-04-22 0.00
B 9 2022-04-23 0.00
B 10 2022-04-24 0.00
B 11 2022-04-25 0.00
B 12 2022-04-26 0.00
B 13 2022-04-27 0.00
B 14 2022-04-28 0.00
B 15 2022-04-29 0.00
B 16 2022-04-30 0.00
B 17 2022-05-01 0.00
B 18 2022-05-02 0.00
B 19 2022-05-03 0.00
B 20 2022-05-04 0.00
B 21 2022-05-05 0.00
B 22 2022-05-06 0.00
B 23 2022-05-07 0.00
B 24 2022-05-08 0.00
B 25 2022-05-09 0.00
B 26 2022-05-10 0.00
B 27 2022-05-11 0.00
B 28 2022-05-12 0.00
B 29 2022-05-13 0.00
B 30 2022-05-14 0.00
B 31 2022-05-15 0.00
B 32 2022-05-16 0.00
B 33 2022-05-17 0.00
B 34 2022-05-18 0.00
B 35 2022-05-19 0.00
B 36 2022-05-20 0.00
B 37 2022-05-21 0.00
B 38 2022-05-22 0.00
B 39 2022-05-23 0.00
B 40 2022-05-24 0.30
B 41 2022-05-25 0.00
B 42 2022-05-26 0.00
B 43 2022-05-27 0.00
B 44 2022-05-28 0.00
B 45 2022-05-29 0.00
B 46 2022-05-30 0.00
B 47 2022-05-31 0.00
B 48 2022-06-01 0.00
B 49 2022-06-02 0.00
B 50 2022-06-03 0.00
B 51 2022-06-04 0.00
B 52 2022-06-05 0.00
B 53 2022-06-06 0.00
B 54 2022-06-07 0.00
B 55 2022-06-08 0.00
B 56 2022-06-09 0.00
B 57 2022-06-10 0.00
B 58 2022-06-11 0.00
B 59 2022-06-12 0.00
B 60 2022-06-13 0.20
B 61 2022-06-14 0.00
B 62 2022-06-15 0.00
B 63 2022-06-16 0.00
B 64 2022-06-17 0.00
B 65 2022-06-18 0.00
B 66 2022-06-19 0.00
B 67 2022-06-20 0.00
B 68 2022-06-21 0.00
B 69 2022-06-22 0.00
B 70 2022-06-23 0.00
B 71 2022-06-24 0.00
B 72 2022-06-25 0.00
B 73 2022-06-26 0.00
B 74 2022-06-27 0.00
B 75 2022-06-28 0.00
B 76 2022-06-29 0.00
B 77 2022-06-30 0.00
B 78 2022-07-01 0.00
B 79 2022-07-02 0.00
B 80 2022-07-03 0.00
B 81 2022-07-04 0.00
B 82 2022-07-05 0.00
B 83 2022-07-06 0.00
B 84 2022-07-07 0.00
B 85 2022-07-08 0.00
B 86 2022-07-09 0.00
B 87 2022-07-10 0.00
B 88 2022-07-11 0.00
B 89 2022-07-12 0.00
B 90 2022-07-13 0.00
;
proc sort; by id;
run;
data a2; input id $ total_interest; /*this is investor id and the total interest they earned over the 90 days*/
datalines;
A 0.29734
B 0.01488
;
proc sort; by id;
data x3; input id $ unobserved_initial_balance; *this is the unobserved initial (beginning of day 1) balance in the investor's account and what I want to infer;
datalines;
A 29
B 1
;
proc sort; by id;
run;
data a4; merge a1 a2 x3; by id;
proc sort; by id;
proc datasets; delete running_tot;
/*
this macro estimates the interest earned each day over the 90 days
bod=beginning of day
eod=end of day
*/
%macro days;
%do v=1 %to 90; *1 to 90;
%let lag_v = %eval(&v. - 1);
%if &v.=1 %then %do; *this computes first day interest;
data d&v.; set a4;
if count=&v.;
bod_balance=unobserved_initial_balance;
if bod_balance<=30 then eod_balance=bod_balance*(1.0535)**(1/365);
if bod_balance gt 30 then eod_balance=30*(1.0535)**(1/365)+(bod_balance-30)*(1.0352)**(1/365);
%end;
%if &v. gt 1 %then %do;
data da_&v.; set a4; if count=&v.; *this gathers information for day t where t>1;
data lag_d&v.; set d&lag_v.; *this gets the eod balance from the previous day which is adjusted for deposits and withdrawals (I assume they occur at the end of the previous day) to compute bod balance for day t;
bod_balance=eod_balance+deposit;
keep bod_balance id;
data d&v.; merge da_&v. lag_d&v.; by id;
%if &v. le 15 %then %do; *interest parameters first 15 days - depends on whether balance the previous day is more or less than 30;
if bod_balance<=30 then EOD_balance=bod_balance*(1.0535)**(1/365);
if bod_balance gt 30 then EOD_balance=30*(1.0535)**(1/365)+(bod_balance-30)*(1.0352)**(1/365);
%end;
%if &v. gt 15 %then %do; *interest parameters last 75 days - depends on whether the balance the previous day is less than 2, between 2-25, or greater than 25;
if bod_balance<=2 then EOD_balance=bod_balance*(1.06)**(1/365);
if bod_balance gt 2 and bod_balance le 25 then EOD_balance=2*(1.06)**(1/365)+(bod_balance-2)*(1.04)**(1/365);
if bod_balance gt 25 then EOD_balance=2*(1.06)**(1/365)+23*(1.04)**(1/365)+(bod_balance-25)*(1.0352)**(1/365);
%end;
%end;
proc append base=running_tot data=d&v.;
%end;
%mend days;
%days;
data look; set running_tot;
inferred_daily_interest=eod_balance-bod_balance;
proc sort; by id date;
proc means noprint; by id; var inferred_daily_interest; *this is the inferred total interest earned over the 90 days;
output out=outlook sum=sum_inferred_interest;
data look2; set outlook;
proc print; run;
Here's one way to use PROC OPTMODEL to solve your problem. Because the beginning-of-day balance is the same as the end-of-day balance for the previous day, I used a decision variable called EodBalance[t] to track both these quantities, with t = 0 corresponding to the initial balance. Also, I used a WHERE= data set option in the READ DATA statements to read only one id at a time. To solve a separate problem for each id you can instead use runOptmodel groupBy or an explicit loop as in your previous question.
proc optmodel;
set DAYS;
num deposit {DAYS};
read data a1(where=(id='A')) into DAYS=[count] deposit;
DAYS = {0} union DAYS;
num total_interest;
read data a2(where=(id='A')) into total_interest;
var EodBalance {DAYS} >= 0;
impvar Interest {t in DAYS diff {0}} =
if t <= 15 then (
if EodBalance[t-1] <= 30 then EodBalance[t-1]*(1.0535**(1/365)-1)
else 30*(1.0535**(1/365)-1) + (EodBalance[t-1]-30)*(1.0352**(1/365)-1)
)
else (
if EodBalance[t-1] <= 2 then EodBalance[t-1]*(1.06**(1/365)-1)
else if EodBalance[t-1] <= 25 then 2*(1.06**(1/365)-1) + (EodBalance[t-1]-2)*(1.04**(1/365)-1)
else 2*(1.06**(1/365)-1) + 23*(1.04**(1/365)-1) + (EodBalance[t-1]-25)*(1.0352**(1/365)-1)
)
;
var TotalInterest >= 0;
con TotalInterestCon:
TotalInterest = sum {t in DAYS diff {0}} Interest[t];
min SumSquaredError = (TotalInterest - total_interest)^2;
con BalanceCon {t in DAYS diff {0}}:
EodBalance[t] = EodBalance[t-1] + Interest[t] + deposit[t];
solve with nlp / opttol=1e-10;
print TotalInterest total_interest;
create data OutData from [count] EodBalance Interest;
quit;
Here's one way to use PROC OPTMODEL to solve your problem. Because the beginning-of-day balance is the same as the end-of-day balance for the previous day, I used a decision variable called EodBalance[t] to track both these quantities, with t = 0 corresponding to the initial balance. Also, I used a WHERE= data set option in the READ DATA statements to read only one id at a time. To solve a separate problem for each id you can instead use runOptmodel groupBy or an explicit loop as in your previous question.
proc optmodel;
set DAYS;
num deposit {DAYS};
read data a1(where=(id='A')) into DAYS=[count] deposit;
DAYS = {0} union DAYS;
num total_interest;
read data a2(where=(id='A')) into total_interest;
var EodBalance {DAYS} >= 0;
impvar Interest {t in DAYS diff {0}} =
if t <= 15 then (
if EodBalance[t-1] <= 30 then EodBalance[t-1]*(1.0535**(1/365)-1)
else 30*(1.0535**(1/365)-1) + (EodBalance[t-1]-30)*(1.0352**(1/365)-1)
)
else (
if EodBalance[t-1] <= 2 then EodBalance[t-1]*(1.06**(1/365)-1)
else if EodBalance[t-1] <= 25 then 2*(1.06**(1/365)-1) + (EodBalance[t-1]-2)*(1.04**(1/365)-1)
else 2*(1.06**(1/365)-1) + 23*(1.04**(1/365)-1) + (EodBalance[t-1]-25)*(1.0352**(1/365)-1)
)
;
var TotalInterest >= 0;
con TotalInterestCon:
TotalInterest = sum {t in DAYS diff {0}} Interest[t];
min SumSquaredError = (TotalInterest - total_interest)^2;
con BalanceCon {t in DAYS diff {0}}:
EodBalance[t] = EodBalance[t-1] + Interest[t] + deposit[t];
solve with nlp / opttol=1e-10;
print TotalInterest total_interest;
create data OutData from [count] EodBalance Interest;
quit;
Rob - this is amazing. Thank you so much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.