BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rj_missionbeach
Fluorite | Level 6

 

Hi all –

 

I’m working with data where I know investors' deposits/withdrawals and the total interest they earned over a 90 day period. I want to infer their initial balance - but it's complicated as the interest rate paid depends on the balance in the account the previous day and varies over time.

 

Specifically –

 

What I know:

  1. The dates and amounts of all deposits and withdrawals (data a1)
  2. The total interest earned by depositors over a 90 day period (data a2)
  3. The interest rate paid

However, the interest rate paid depends on both date and, importantly, the balance in the account

3a. As shown in my code, in days 1-15, the first 30 of deposits earns 5.35% while deposits over 30 earn 3.52%

3b. As shown in my code, in days 16-90, the first 2 of deposits earns 6%, then next 23 earn 4% and anything over 25 earns 3.52%

 

What I want:

Given the above, I want to infer the "unobserved_initial_balance". Given the structure of interest rates (as a function of the balance each day), I believe the only way to solve is iteratively. I can solve it iteratively with excel solver, but my dataset is large so that's not an option.

 

I wrote  code to demonstrate the problem. My code assumes one knows the initial balance in the account (the variable “unobserved_initial_balance”) so that I can demonstrate how to compute the interest earned each day (as a function of the previous day’s balance and time-varying interest rates). Specifically, my macro computes interest each day as the difference between beginning of day balance "bod_balance" (the balance from the previous end of day + deposit/withdrawals the previous day) and end of day balance ("eod_balance") where the end of day balance is the beginning of day balance plus interest earned that day. Total interest earned is the sum of the daily values (i.e., sum_inferred_interest). Thus, I’d want to minimize the difference between sum_inferrred_interest and total_interest.

 

A few other thoughts:

  1. I resorted to the macro loop after trying to write the code in a single data step using retain, because I couldn't figure out how to allow for the conditional values (e.g., if bod_balance gt 30 then...). Not sure if the macro code would work well with proc optmodel.
  2. Although my code runs this for both individuals simultaneously, I'd be fine just solving for each individual alone and looping over individual.

I am unfamiliar with proc optmodel and this seems like a relatively complicated case, so any help would be greatly appreciated.

 

Thanks,

Rick

 

options nocenter;
proc datasets kill;

data a1; input id $ count date :yymmdd10. deposit; /*this is investor id, a count variable (that goes from 1-90 for each investor), and dates of investors' deposits (positive value) or withdrawals (negative values*/
format date yymmdd10.;
datalines;
A	1	2022-04-15	0.00
A	2	2022-04-16	0.00
A	3	2022-04-17	0.00
A	4	2022-04-18	0.00
A	5	2022-04-19	-3.00
A	6	2022-04-20	0.00
A	7	2022-04-21	0.00
A	8	2022-04-22	0.00
A	9	2022-04-23	0.00
A	10	2022-04-24	0.00
A	11	2022-04-25	0.00
A	12	2022-04-26	0.00
A	13	2022-04-27	0.00
A	14	2022-04-28	0.00
A	15	2022-04-29	0.00
A	16	2022-04-30	0.00
A	17	2022-05-01	0.00
A	18	2022-05-02	0.00
A	19	2022-05-03	0.00
A	20	2022-05-04	0.00
A	21	2022-05-05	0.00
A	22	2022-05-06	0.00
A	23	2022-05-07	0.00
A	24	2022-05-08	0.00
A	25	2022-05-09	0.00
A	26	2022-05-10	0.00
A	27	2022-05-11	0.00
A	28	2022-05-12	0.00
A	29	2022-05-13	0.00
A	30	2022-05-14	0.00
A	31	2022-05-15	0.00
A	32	2022-05-16	0.00
A	33	2022-05-17	0.00
A	34	2022-05-18	0.00
A	35	2022-05-19	0.00
A	36	2022-05-20	0.00
A	37	2022-05-21	0.00
A	38	2022-05-22	0.00
A	39	2022-05-23	0.00
A	40	2022-05-24	4.00
A	41	2022-05-25	0.00
A	42	2022-05-26	0.00
A	43	2022-05-27	0.00
A	44	2022-05-28	0.00
A	45	2022-05-29	0.00
A	46	2022-05-30	0.00
A	47	2022-05-31	0.00
A	48	2022-06-01	0.00
A	49	2022-06-02	0.00
A	50	2022-06-03	0.00
A	51	2022-06-04	0.00
A	52	2022-06-05	0.00
A	53	2022-06-06	0.00
A	54	2022-06-07	0.00
A	55	2022-06-08	0.00
A	56	2022-06-09	0.00
A	57	2022-06-10	0.00
A	58	2022-06-11	0.00
A	59	2022-06-12	0.00
A	60	2022-06-13	1.00
A	61	2022-06-14	0.00
A	62	2022-06-15	0.00
A	63	2022-06-16	0.00
A	64	2022-06-17	0.00
A	65	2022-06-18	0.00
A	66	2022-06-19	0.00
A	67	2022-06-20	0.00
A	68	2022-06-21	0.00
A	69	2022-06-22	0.00
A	70	2022-06-23	0.00
A	71	2022-06-24	0.00
A	72	2022-06-25	0.00
A	73	2022-06-26	0.00
A	74	2022-06-27	0.00
A	75	2022-06-28	0.00
A	76	2022-06-29	0.00
A	77	2022-06-30	0.00
A	78	2022-07-01	0.00
A	79	2022-07-02	0.00
A	80	2022-07-03	0.00
A	81	2022-07-04	0.00
A	82	2022-07-05	0.00
A	83	2022-07-06	0.00
A	84	2022-07-07	0.00
A	85	2022-07-08	0.00
A	86	2022-07-09	0.00
A	87	2022-07-10	0.00
A	88	2022-07-11	0.00
A	89	2022-07-12	0.00
A	90	2022-07-13	0.00
B	1	2022-04-15	0.00
B	2	2022-04-16	0.00
B	3	2022-04-17	0.00
B	4	2022-04-18	0.00
B	5	2022-04-19	-0.20
B	6	2022-04-20	0.00
B	7	2022-04-21	0.00
B	8	2022-04-22	0.00
B	9	2022-04-23	0.00
B	10	2022-04-24	0.00
B	11	2022-04-25	0.00
B	12	2022-04-26	0.00
B	13	2022-04-27	0.00
B	14	2022-04-28	0.00
B	15	2022-04-29	0.00
B	16	2022-04-30	0.00
B	17	2022-05-01	0.00
B	18	2022-05-02	0.00
B	19	2022-05-03	0.00
B	20	2022-05-04	0.00
B	21	2022-05-05	0.00
B	22	2022-05-06	0.00
B	23	2022-05-07	0.00
B	24	2022-05-08	0.00
B	25	2022-05-09	0.00
B	26	2022-05-10	0.00
B	27	2022-05-11	0.00
B	28	2022-05-12	0.00
B	29	2022-05-13	0.00
B	30	2022-05-14	0.00
B	31	2022-05-15	0.00
B	32	2022-05-16	0.00
B	33	2022-05-17	0.00
B	34	2022-05-18	0.00
B	35	2022-05-19	0.00
B	36	2022-05-20	0.00
B	37	2022-05-21	0.00
B	38	2022-05-22	0.00
B	39	2022-05-23	0.00
B	40	2022-05-24	0.30
B	41	2022-05-25	0.00
B	42	2022-05-26	0.00
B	43	2022-05-27	0.00
B	44	2022-05-28	0.00
B	45	2022-05-29	0.00
B	46	2022-05-30	0.00
B	47	2022-05-31	0.00
B	48	2022-06-01	0.00
B	49	2022-06-02	0.00
B	50	2022-06-03	0.00
B	51	2022-06-04	0.00
B	52	2022-06-05	0.00
B	53	2022-06-06	0.00
B	54	2022-06-07	0.00
B	55	2022-06-08	0.00
B	56	2022-06-09	0.00
B	57	2022-06-10	0.00
B	58	2022-06-11	0.00
B	59	2022-06-12	0.00
B	60	2022-06-13	0.20
B	61	2022-06-14	0.00
B	62	2022-06-15	0.00
B	63	2022-06-16	0.00
B	64	2022-06-17	0.00
B	65	2022-06-18	0.00
B	66	2022-06-19	0.00
B	67	2022-06-20	0.00
B	68	2022-06-21	0.00
B	69	2022-06-22	0.00
B	70	2022-06-23	0.00
B	71	2022-06-24	0.00
B	72	2022-06-25	0.00
B	73	2022-06-26	0.00
B	74	2022-06-27	0.00
B	75	2022-06-28	0.00
B	76	2022-06-29	0.00
B	77	2022-06-30	0.00
B	78	2022-07-01	0.00
B	79	2022-07-02	0.00
B	80	2022-07-03	0.00
B	81	2022-07-04	0.00
B	82	2022-07-05	0.00
B	83	2022-07-06	0.00
B	84	2022-07-07	0.00
B	85	2022-07-08	0.00
B	86	2022-07-09	0.00
B	87	2022-07-10	0.00
B	88	2022-07-11	0.00
B	89	2022-07-12	0.00
B	90	2022-07-13	0.00
;
proc sort; by id;
run;

data a2; input id $ total_interest; /*this is investor id and the total interest they earned over the 90 days*/
datalines;
A	0.29734
B	0.01488
;
proc sort; by id;

data x3; input id $ unobserved_initial_balance; *this is the unobserved initial (beginning of day 1) balance in the investor's account and what I want to infer;
datalines;
A	29
B	1
;
proc sort; by id;
run;

data a4; merge a1 a2 x3; by id; 
proc sort; by id;

proc datasets; delete running_tot;

/*
this macro estimates the interest earned each day over the 90 days
bod=beginning of day
eod=end of day
*/

%macro days;
	%do v=1 %to 90; *1 to 90;
		%let lag_v = %eval(&v. - 1);

		%if &v.=1 %then %do; *this computes first day interest;
			data d&v.; set a4;
			if count=&v.;
			bod_balance=unobserved_initial_balance;
				if bod_balance<=30 then eod_balance=bod_balance*(1.0535)**(1/365);
				if bod_balance gt 30 then eod_balance=30*(1.0535)**(1/365)+(bod_balance-30)*(1.0352)**(1/365);
		%end;

		%if &v. gt 1 %then %do;

			data da_&v.; set a4; if count=&v.; *this gathers information for day t where t>1;
			
			data lag_d&v.; set d&lag_v.;       *this gets the eod balance from the previous day which is adjusted for deposits and withdrawals (I assume they occur at the end of the previous day) to compute bod balance for day t;
			bod_balance=eod_balance+deposit;
			keep bod_balance id;

			data d&v.; merge da_&v. lag_d&v.; by id;
				%if &v. le 15 %then %do;  *interest parameters first 15 days - depends on whether balance the previous day is more or less than 30;
					if bod_balance<=30 then EOD_balance=bod_balance*(1.0535)**(1/365);
					if bod_balance gt 30 then EOD_balance=30*(1.0535)**(1/365)+(bod_balance-30)*(1.0352)**(1/365);
				%end;

				%if &v. gt 15 %then %do; *interest parameters last 75 days - depends on whether the balance the previous day is less than 2, between 2-25, or greater than 25;
					if bod_balance<=2 then EOD_balance=bod_balance*(1.06)**(1/365);
					if bod_balance gt 2 and bod_balance le 25 then EOD_balance=2*(1.06)**(1/365)+(bod_balance-2)*(1.04)**(1/365);
					if bod_balance gt 25 then EOD_balance=2*(1.06)**(1/365)+23*(1.04)**(1/365)+(bod_balance-25)*(1.0352)**(1/365);
				%end;
		%end;

		proc append base=running_tot data=d&v.;
	%end;
%mend days;
%days;

data look; set running_tot;
inferred_daily_interest=eod_balance-bod_balance;
proc sort; by id date;
proc means noprint; by id; var inferred_daily_interest; *this is the inferred total interest earned over the 90 days;
output out=outlook sum=sum_inferred_interest;

data look2; set outlook; 
proc print; run;


1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ

Here's one way to use PROC OPTMODEL to solve your problem.  Because the beginning-of-day balance is the same as the end-of-day balance for the previous day, I used a decision variable called EodBalance[t] to track both these quantities, with t = 0 corresponding to the initial balance.  Also, I used a WHERE= data set option in the READ DATA statements to read only one id at a time.  To solve a separate problem for each id you can instead use runOptmodel groupBy or an explicit loop as in your previous question.

 

proc optmodel;
   set DAYS;
   num deposit {DAYS};
   read data a1(where=(id='A')) into DAYS=[count] deposit;
   DAYS = {0} union DAYS;

   num total_interest;
   read data a2(where=(id='A')) into total_interest;

   var EodBalance {DAYS} >= 0;
   impvar Interest {t in DAYS diff {0}} = 
      if t <= 15 then (
         if EodBalance[t-1] <= 30 then EodBalance[t-1]*(1.0535**(1/365)-1)
         else 30*(1.0535**(1/365)-1) + (EodBalance[t-1]-30)*(1.0352**(1/365)-1)
      )
      else (
         if EodBalance[t-1] <= 2 then EodBalance[t-1]*(1.06**(1/365)-1)
         else if EodBalance[t-1] <= 25 then 2*(1.06**(1/365)-1) + (EodBalance[t-1]-2)*(1.04**(1/365)-1)
         else 2*(1.06**(1/365)-1) + 23*(1.04**(1/365)-1) + (EodBalance[t-1]-25)*(1.0352**(1/365)-1)
      )
   ;
   var TotalInterest >= 0;
   con TotalInterestCon:
      TotalInterest = sum {t in DAYS diff {0}} Interest[t];
   min SumSquaredError = (TotalInterest - total_interest)^2;
   con BalanceCon {t in DAYS diff {0}}:
      EodBalance[t] = EodBalance[t-1] + Interest[t] + deposit[t];

   solve with nlp / opttol=1e-10;
   print TotalInterest total_interest;
   create data OutData from [count] EodBalance Interest;
quit;

View solution in original post

2 REPLIES 2
RobPratt
SAS Super FREQ

Here's one way to use PROC OPTMODEL to solve your problem.  Because the beginning-of-day balance is the same as the end-of-day balance for the previous day, I used a decision variable called EodBalance[t] to track both these quantities, with t = 0 corresponding to the initial balance.  Also, I used a WHERE= data set option in the READ DATA statements to read only one id at a time.  To solve a separate problem for each id you can instead use runOptmodel groupBy or an explicit loop as in your previous question.

 

proc optmodel;
   set DAYS;
   num deposit {DAYS};
   read data a1(where=(id='A')) into DAYS=[count] deposit;
   DAYS = {0} union DAYS;

   num total_interest;
   read data a2(where=(id='A')) into total_interest;

   var EodBalance {DAYS} >= 0;
   impvar Interest {t in DAYS diff {0}} = 
      if t <= 15 then (
         if EodBalance[t-1] <= 30 then EodBalance[t-1]*(1.0535**(1/365)-1)
         else 30*(1.0535**(1/365)-1) + (EodBalance[t-1]-30)*(1.0352**(1/365)-1)
      )
      else (
         if EodBalance[t-1] <= 2 then EodBalance[t-1]*(1.06**(1/365)-1)
         else if EodBalance[t-1] <= 25 then 2*(1.06**(1/365)-1) + (EodBalance[t-1]-2)*(1.04**(1/365)-1)
         else 2*(1.06**(1/365)-1) + 23*(1.04**(1/365)-1) + (EodBalance[t-1]-25)*(1.0352**(1/365)-1)
      )
   ;
   var TotalInterest >= 0;
   con TotalInterestCon:
      TotalInterest = sum {t in DAYS diff {0}} Interest[t];
   min SumSquaredError = (TotalInterest - total_interest)^2;
   con BalanceCon {t in DAYS diff {0}}:
      EodBalance[t] = EodBalance[t-1] + Interest[t] + deposit[t];

   solve with nlp / opttol=1e-10;
   print TotalInterest total_interest;
   create data OutData from [count] EodBalance Interest;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 734 views
  • 0 likes
  • 2 in conversation