I'm interested in applying the transreg procedure in order to regress the APR on interest rate and terms.
The apr can be calculated like this:
As the APR can easily be calculated by the formula
I generate a training data set via IML.
PROC IML;
/*Holds 100x3 randomly generated values*/
RANDOM=J(100,3,0);
CALL RANDGEN(RANDOM, "UNIFORM");
RANDOM=RANDOM-0.5+1 ;
/*Center random matrix around number of terms, yearly interest rate, and principal*/
CENTER={240 0.036 100000};
RANDOM2=RANDOM#CENTER;
RANDOM2[,1]=CEIL(RANDOM2[,1]); /*allow only integers for number of terms*/
/*create matrix that holds i12 and (1+i12)*/
RANDOM3=RANDOM2[,2] #(1/12) || RANDOM2[,2]#(1/12) + 1;
/*create matrix that stores the monthly payment*/
RANDOM4=J(100,1,0);
RANDOM10=RANDOM4;
/*alt1: calculate monthly payment with do loop*/
DO CT2=1 TO NROW(RANDOM);
RANDOM4[CT2]=RANDOM2[CT2,3] *
((RANDOM3[CT2,2]##RANDOM2[CT2,1]-1)/(RANDOM3[CT2,1]*RANDOM3[CT2,2]##RANDOM2[CT2,1]))##-1;
END;
/*alt2: calculate monthly payment via vecorized approach*/
RANDOM10=RANDOM2[,3] # ((RANDOM3[,2]##RANDOM2[,1]-1) #
(RANDOM3[,1]#RANDOM3[,2]##RANDOM2[,1])##-1)##-1;
/*control both alt give same result*/
/*PRINT RANDOM10 RANDOM4;*/
/*convert monthly payment to percentage of principal*/
CPERC=RANDOM4 # (RANDOM2[,3]##-1);
TITEL={"N" "I" "PRINCIPAL" "Q12" "I12" "PAYMENT" "PAYMENT_PERC" };
AMORT= RANDOM2 || RANDOM3 || RANDOM4 || CPERC ;
/*PRINT AMORT [COLNAME=TITEL];*/
create TRAINING from AMORT[colname=TITEL];
append from AMORT;
close TRAINING;
quit;
How can I use proc transreg to approximate the APR via a regression formula?
Thank you, I would like to understand it.
Without success I tried to make a contourplot having on the box axis i and n and as response the payment in percentage.
What have you tried? I assume the explanatory variables are N, I, and Principal. Which variable is the response? What is the purpose of the other variables like Q12 and I12?
Two suggestion: (1) Look into PROC NLIN, and (2) The data might be easier to fit if you take the LOG of both sides of the APR equation.
Thank you @Rick_SAS for your fast response.
I want to model the response payment in percentage by the explanatory variables N :=number of total repayment terms in month and I12:=monthly interest rate and Q12:=(I12 +1). That way I'm using all the variables that appear in the mathematical formula for calculating the periodic payment expressed as a percentage of the principal or in absolute terms when multiplied by the principal.
As the formula is a quotient and its denominator and its numerator both have the term "Q12 to the power of N" I cannot even find a starting point.
I'll look for Proc NLIN. Hopefully, this is still helpful after clarifying this.
Arne
OK, log to both sides of the above stated formula results in:
P=1;
q=(1+i)
A = (i*q^n)/(q^n – 1)
Ln(A)=(ln(i) + n*ln(q)) - (ln(q^n – 1)
I suppose the log of sums cannot be simplified.
If I hadn’t this term I would see clearly that a polynomial solution of the original variables and their logs could perfectly fit the data.
I keep going despite of this drawback.
I create the log_variables of interest:
DATA TEMP;
SET Training;
LN_Q=LOG(Q);
LN_CUOTAPERC=LOG(CUOTAPERC);
LN_IP=LOG(IP);
LN_N=LOG(N);
RUN;
I create a new data set with the interaction terms.
proc logistic data=TEMP outdesign=EFF outdesignonly ;
effect poly2 = polynomial(N LN_IP LN_Q / degree=3);
model LN_CUOTAPERC = poly2 CUOTAPERC IP Q CUOTA CAP;
ID CUOTAPERC;
run;
I create a macro variable that holds all the variables.
DATA _NULL_;
SET SASHELP.VCOLUMN;
WHERE memname="EFF" AND libname="WORK";
FORMAT ST $1000.;
RETAIN ST;
ST=CATX(" ", ST, "'"||STRIP(NAME)||"'N");
CALL SYMPUTX("VARS",ST);
RUN;
PROC glmselect DATA=EFF ;
MODEL LN_CUOTAPERC = 'N'N 'LN_IP'N 'LN_Q'N 'N^2'N 'N*LN_IP'N 'N*LN_Q'N 'LN_IP^2'N 'LN_IP*LN_Q'N 'LN_Q^2'N 'N^3'N
'N^2*LN_IP'N 'N^2*LN_Q'N 'N*LN_IP^2'N 'N*LN_IP*LN_Q'N 'N*LN_Q^2'N
'LN_IP^3'N 'LN_IP^2*LN_Q'N 'LN_IP*LN_Q^2'N 'LN_Q^3'N / selection=stepwise ;
output out=TEST PREDICTED=P;
RUN;
DATA TEST1;
SET TEST;
COEF_REG=EXP(P);
COEF_CTR=EXP(LN_CUOTAPERC);
CUOTA_ORIG=CAP*COEF_CTR;
CUOTA_REG=CAP*COEF_REG;
KEEP CUOTA CUOTA_: COEF_:;
RUN;
The fit looks quite good.
But I'm sure that there are many more accurate ways.
The Proc NLIN in my understanding doesn't help me out because I'd have to make first guesses about the parameters. And as I'd mentioned before, I'd like to simulate that I don't know the underlying formula.
@Ksharp @Rick_SAS Do you understand what I´m trying to achieve?
According to the overview excerpt from the NLIN manual:
The NLIN procedure produces least squares or weighted least squares estimates of
the parameters of a nonlinear model. Nonlinear models are more difficult to specify
and estimate than linear models. Instead of simply listing regressor variables, you
must write the regression expression, declare parameter names, and supply initial
parameter values. Some models are difficult to fit, and there is no guarantee that the
procedure can fit the model successfully.
For each nonlinear model to be analyzed, you must specify the model (using a single
dependent variable) and the names and starting values of the parameters to be
estimated.
...I need to specify a model.
But what if I didn't know that the response variable was generated by the Annual Percentage Rate (APR)?
My best guess is to generate many interaction terms of the independent variables (including their inverse) and run a regression proc that allows for model selection.
Why do I do this? I have many formulas in excel sheets that calculate monthly payments and return rates like ROE and all at some extent discount future flows to get a comparable current value.
I don´t want to transfer all these functions to SAS IML and instead I want to have a regression formula that predicts the response variable, here the monthly payment, on the scenario of many combinations of the input variables.
Thanks in advance
Sorry, but I still don't understand your problem, so someone else will have to help you. I don't understand what parameter(s) you trying to fit. Regression estimates PARAMETERS that relate the independent and dependent variables.
I question whether you want to use Q12=I12+1 as one of your explanatory variables. That will cause collinearity in your regressors.
Good luck.
With your suggestions you have taken me further on my learning curve.
But I get a Llttle bit upset that I cannot succeed in delivering my point and I feel misundertood.
For sure I know that "Regression estimates PARAMETERS that relate the independent and dependent variables."
That's the reason why I want to predict the payment rate by the indpendent variables interest rate and re-payment terms.
And I want to get a regression formula because I assume for the sake of the learning exercise that I don't know the APR formula to calculate it.
The next time I throw in interest rate and numer of re-payment terms I want to predict the payment rate.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.