Hi all – I’m working with data where I know investors' deposits/withdrawals and the total interest they earned over a 90 day period. I want to infer their initial balance - but it's complicated as the interest rate paid depends on the balance in the account the previous day and varies over time. Specifically – What I know: The dates and amounts of all deposits and withdrawals (data a1) The total interest earned by depositors over a 90 day period (data a2) The interest rate paid However, the interest rate paid depends on both date and, importantly, the balance in the account 3a. As shown in my code, in days 1-15, the first 30 of deposits earns 5.35% while deposits over 30 earn 3.52% 3b. As shown in my code, in days 16-90, the first 2 of deposits earns 6%, then next 23 earn 4% and anything over 25 earns 3.52% What I want: Given the above, I want to infer the "unobserved_initial_balance". Given the structure of interest rates (as a function of the balance each day), I believe the only way to solve is iteratively. I can solve it iteratively with excel solver, but my dataset is large so that's not an option. I wrote code to demonstrate the problem. My code assumes one knows the initial balance in the account (the variable “unobserved_initial_balance”) so that I can demonstrate how to compute the interest earned each day (as a function of the previous day’s balance and time-varying interest rates). Specifically, my macro computes interest each day as the difference between beginning of day balance "bod_balance" (the balance from the previous end of day + deposit/withdrawals the previous day) and end of day balance ("eod_balance") where the end of day balance is the beginning of day balance plus interest earned that day. Total interest earned is the sum of the daily values (i.e., sum_inferred_interest). Thus, I’d want to minimize the difference between sum_inferrred_interest and total_interest. A few other thoughts: I resorted to the macro loop after trying to write the code in a single data step using retain, because I couldn't figure out how to allow for the conditional values (e.g., if bod_balance gt 30 then...). Not sure if the macro code would work well with proc optmodel. Although my code runs this for both individuals simultaneously, I'd be fine just solving for each individual alone and looping over individual. I am unfamiliar with proc optmodel and this seems like a relatively complicated case, so any help would be greatly appreciated. Thanks, Rick options nocenter;
proc datasets kill;
data a1; input id $ count date :yymmdd10. deposit; /*this is investor id, a count variable (that goes from 1-90 for each investor), and dates of investors' deposits (positive value) or withdrawals (negative values*/
format date yymmdd10.;
datalines;
A 1 2022-04-15 0.00
A 2 2022-04-16 0.00
A 3 2022-04-17 0.00
A 4 2022-04-18 0.00
A 5 2022-04-19 -3.00
A 6 2022-04-20 0.00
A 7 2022-04-21 0.00
A 8 2022-04-22 0.00
A 9 2022-04-23 0.00
A 10 2022-04-24 0.00
A 11 2022-04-25 0.00
A 12 2022-04-26 0.00
A 13 2022-04-27 0.00
A 14 2022-04-28 0.00
A 15 2022-04-29 0.00
A 16 2022-04-30 0.00
A 17 2022-05-01 0.00
A 18 2022-05-02 0.00
A 19 2022-05-03 0.00
A 20 2022-05-04 0.00
A 21 2022-05-05 0.00
A 22 2022-05-06 0.00
A 23 2022-05-07 0.00
A 24 2022-05-08 0.00
A 25 2022-05-09 0.00
A 26 2022-05-10 0.00
A 27 2022-05-11 0.00
A 28 2022-05-12 0.00
A 29 2022-05-13 0.00
A 30 2022-05-14 0.00
A 31 2022-05-15 0.00
A 32 2022-05-16 0.00
A 33 2022-05-17 0.00
A 34 2022-05-18 0.00
A 35 2022-05-19 0.00
A 36 2022-05-20 0.00
A 37 2022-05-21 0.00
A 38 2022-05-22 0.00
A 39 2022-05-23 0.00
A 40 2022-05-24 4.00
A 41 2022-05-25 0.00
A 42 2022-05-26 0.00
A 43 2022-05-27 0.00
A 44 2022-05-28 0.00
A 45 2022-05-29 0.00
A 46 2022-05-30 0.00
A 47 2022-05-31 0.00
A 48 2022-06-01 0.00
A 49 2022-06-02 0.00
A 50 2022-06-03 0.00
A 51 2022-06-04 0.00
A 52 2022-06-05 0.00
A 53 2022-06-06 0.00
A 54 2022-06-07 0.00
A 55 2022-06-08 0.00
A 56 2022-06-09 0.00
A 57 2022-06-10 0.00
A 58 2022-06-11 0.00
A 59 2022-06-12 0.00
A 60 2022-06-13 1.00
A 61 2022-06-14 0.00
A 62 2022-06-15 0.00
A 63 2022-06-16 0.00
A 64 2022-06-17 0.00
A 65 2022-06-18 0.00
A 66 2022-06-19 0.00
A 67 2022-06-20 0.00
A 68 2022-06-21 0.00
A 69 2022-06-22 0.00
A 70 2022-06-23 0.00
A 71 2022-06-24 0.00
A 72 2022-06-25 0.00
A 73 2022-06-26 0.00
A 74 2022-06-27 0.00
A 75 2022-06-28 0.00
A 76 2022-06-29 0.00
A 77 2022-06-30 0.00
A 78 2022-07-01 0.00
A 79 2022-07-02 0.00
A 80 2022-07-03 0.00
A 81 2022-07-04 0.00
A 82 2022-07-05 0.00
A 83 2022-07-06 0.00
A 84 2022-07-07 0.00
A 85 2022-07-08 0.00
A 86 2022-07-09 0.00
A 87 2022-07-10 0.00
A 88 2022-07-11 0.00
A 89 2022-07-12 0.00
A 90 2022-07-13 0.00
B 1 2022-04-15 0.00
B 2 2022-04-16 0.00
B 3 2022-04-17 0.00
B 4 2022-04-18 0.00
B 5 2022-04-19 -0.20
B 6 2022-04-20 0.00
B 7 2022-04-21 0.00
B 8 2022-04-22 0.00
B 9 2022-04-23 0.00
B 10 2022-04-24 0.00
B 11 2022-04-25 0.00
B 12 2022-04-26 0.00
B 13 2022-04-27 0.00
B 14 2022-04-28 0.00
B 15 2022-04-29 0.00
B 16 2022-04-30 0.00
B 17 2022-05-01 0.00
B 18 2022-05-02 0.00
B 19 2022-05-03 0.00
B 20 2022-05-04 0.00
B 21 2022-05-05 0.00
B 22 2022-05-06 0.00
B 23 2022-05-07 0.00
B 24 2022-05-08 0.00
B 25 2022-05-09 0.00
B 26 2022-05-10 0.00
B 27 2022-05-11 0.00
B 28 2022-05-12 0.00
B 29 2022-05-13 0.00
B 30 2022-05-14 0.00
B 31 2022-05-15 0.00
B 32 2022-05-16 0.00
B 33 2022-05-17 0.00
B 34 2022-05-18 0.00
B 35 2022-05-19 0.00
B 36 2022-05-20 0.00
B 37 2022-05-21 0.00
B 38 2022-05-22 0.00
B 39 2022-05-23 0.00
B 40 2022-05-24 0.30
B 41 2022-05-25 0.00
B 42 2022-05-26 0.00
B 43 2022-05-27 0.00
B 44 2022-05-28 0.00
B 45 2022-05-29 0.00
B 46 2022-05-30 0.00
B 47 2022-05-31 0.00
B 48 2022-06-01 0.00
B 49 2022-06-02 0.00
B 50 2022-06-03 0.00
B 51 2022-06-04 0.00
B 52 2022-06-05 0.00
B 53 2022-06-06 0.00
B 54 2022-06-07 0.00
B 55 2022-06-08 0.00
B 56 2022-06-09 0.00
B 57 2022-06-10 0.00
B 58 2022-06-11 0.00
B 59 2022-06-12 0.00
B 60 2022-06-13 0.20
B 61 2022-06-14 0.00
B 62 2022-06-15 0.00
B 63 2022-06-16 0.00
B 64 2022-06-17 0.00
B 65 2022-06-18 0.00
B 66 2022-06-19 0.00
B 67 2022-06-20 0.00
B 68 2022-06-21 0.00
B 69 2022-06-22 0.00
B 70 2022-06-23 0.00
B 71 2022-06-24 0.00
B 72 2022-06-25 0.00
B 73 2022-06-26 0.00
B 74 2022-06-27 0.00
B 75 2022-06-28 0.00
B 76 2022-06-29 0.00
B 77 2022-06-30 0.00
B 78 2022-07-01 0.00
B 79 2022-07-02 0.00
B 80 2022-07-03 0.00
B 81 2022-07-04 0.00
B 82 2022-07-05 0.00
B 83 2022-07-06 0.00
B 84 2022-07-07 0.00
B 85 2022-07-08 0.00
B 86 2022-07-09 0.00
B 87 2022-07-10 0.00
B 88 2022-07-11 0.00
B 89 2022-07-12 0.00
B 90 2022-07-13 0.00
;
proc sort; by id;
run;
data a2; input id $ total_interest; /*this is investor id and the total interest they earned over the 90 days*/
datalines;
A 0.29734
B 0.01488
;
proc sort; by id;
data x3; input id $ unobserved_initial_balance; *this is the unobserved initial (beginning of day 1) balance in the investor's account and what I want to infer;
datalines;
A 29
B 1
;
proc sort; by id;
run;
data a4; merge a1 a2 x3; by id;
proc sort; by id;
proc datasets; delete running_tot;
/*
this macro estimates the interest earned each day over the 90 days
bod=beginning of day
eod=end of day
*/
%macro days;
%do v=1 %to 90; *1 to 90;
%let lag_v = %eval(&v. - 1);
%if &v.=1 %then %do; *this computes first day interest;
data d&v.; set a4;
if count=&v.;
bod_balance=unobserved_initial_balance;
if bod_balance<=30 then eod_balance=bod_balance*(1.0535)**(1/365);
if bod_balance gt 30 then eod_balance=30*(1.0535)**(1/365)+(bod_balance-30)*(1.0352)**(1/365);
%end;
%if &v. gt 1 %then %do;
data da_&v.; set a4; if count=&v.; *this gathers information for day t where t>1;
data lag_d&v.; set d&lag_v.; *this gets the eod balance from the previous day which is adjusted for deposits and withdrawals (I assume they occur at the end of the previous day) to compute bod balance for day t;
bod_balance=eod_balance+deposit;
keep bod_balance id;
data d&v.; merge da_&v. lag_d&v.; by id;
%if &v. le 15 %then %do; *interest parameters first 15 days - depends on whether balance the previous day is more or less than 30;
if bod_balance<=30 then EOD_balance=bod_balance*(1.0535)**(1/365);
if bod_balance gt 30 then EOD_balance=30*(1.0535)**(1/365)+(bod_balance-30)*(1.0352)**(1/365);
%end;
%if &v. gt 15 %then %do; *interest parameters last 75 days - depends on whether the balance the previous day is less than 2, between 2-25, or greater than 25;
if bod_balance<=2 then EOD_balance=bod_balance*(1.06)**(1/365);
if bod_balance gt 2 and bod_balance le 25 then EOD_balance=2*(1.06)**(1/365)+(bod_balance-2)*(1.04)**(1/365);
if bod_balance gt 25 then EOD_balance=2*(1.06)**(1/365)+23*(1.04)**(1/365)+(bod_balance-25)*(1.0352)**(1/365);
%end;
%end;
proc append base=running_tot data=d&v.;
%end;
%mend days;
%days;
data look; set running_tot;
inferred_daily_interest=eod_balance-bod_balance;
proc sort; by id date;
proc means noprint; by id; var inferred_daily_interest; *this is the inferred total interest earned over the 90 days;
output out=outlook sum=sum_inferred_interest;
data look2; set outlook;
proc print; run;
... View more