Hello everyone,
I'm a beginner in sas and I'm wondering a question.
After my 2 regressions, I would like to know whether the difference between my 2 intercepts is statistically significant or not.
Could you please help me to write the appropriate code ?
All of my variables are numbers.
proc reg data=modelization outest=est; /*1*/
M1: model rp=rm / selection=rsquare b best=1;
MFS1: model rp=rm rm_zdy rm_ztbl rm_ztms rm_zdfy / selection=rsquare b best=1;
proc print data=est;
run;
If I understand your question correctly, you want to compare the coefficients of different models. A method for doing this is described here:
I don't think that link covers the same situation as here, where two models are computed from the same data.
I don't think there is a canned test to compare intercepts from two models on the same data, other than by creating your own -- Intercept1 and Intercept2 are estimates which have a normal distribution (under certain assumptions), but they are NOT independent as they are based upon the same data.
So, my thoughts are that the only real way to compare these is via a simulation in the form of a randomization test or a permutation test, and even then that does not account for the fact that the intercepts should be different in different models. Why? Since the second model has additional terms added, I would expect the intercepts to NOT be the same, unless the additional terms added were orthogonal to the other terms in the model (as in a designed experiment) in which case then the intercepts should not change.
Thanks Paige for your answer.
I tried to summarize my thoughts but this is what I would like exactly to do with more details.
I have 10 models to estimate in order to say which is the correct one to predict returns (rp). Consequently, I decided to make 12 main comparisons (2 by 2) of my regression models by comparing each time only my 2 intercepts (not the other betas coefficients). The trouble is I do not want to do this only with one returns serie but ultimately with around 2,000 (returns series). For now, I would like to make it as easy as I can. If I can do this on 1 serie, I can do it on 2,000.
So ultimately, for each model, I would like firstly to compute intercepts mean, intercepts percentiles, intercepts distribution, etc. Secondly, to know which model is correct to predict reurns, I would like to compare 2 by 2 all of my 10 models by testing each time whether the 2 intercepts are statistically the same or not.
Please find below my 12 comparisons:
proc reg data=modelization outest=est; /*1*/
M1: model rp=rm / selection=rsquare b best=1;
MFS1: model rp=rm rm_zdy rm_ztbl rm_ztms rm_zdfy / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*2*/
M2: model rp=rm rmrsq / selection=rsquare b best=1;
MFS2: model rp=rm rm_zdy rm_ztbl rm_ztms rm_zdfy rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*3*/
M1: model rp=rm / selection=rsquare b best=1;
MPM10: model rp=rm rm_pred_mean / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*4*/
M2: model rp=rm rmrsq / selection=rsquare b best=1;
MPM11: model rp=rm rm_pred_mean rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*5*/
M1: model rp=rm / selection=rsquare b best=1;
MCFG1: model rp=rm zdy ztbl ztms zdfy rm_zdy rm_ztbl rm_ztms rm_zdfy / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*6*/
M2: model rp=rm rmrsq / selection=rsquare b best=1;
MCFG2: model rp=rm zdy ztbl ztms zdfy rm_zdy rm_ztbl rm_ztms rm_zdfy rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*7*/
M1: model rp=rm / selection=rsquare b best=1;
MPM20: model rp=pred_mean rm rm_pred_mean / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*8*/
M2: model rp=rm rmrsq / selection=rsquare b best=1;
MPM21: model rp=pred_mean rm rm_pred_mean rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*9*/
MFS1: model rp=rm rm_zdy rm_ztbl rm_ztms rm_zdfy / selection=rsquare b best=1;
MPM10: model rp=rm rm_pred_mean / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*10*/
MFS2: model rp=rm rm_zdy rm_ztbl rm_ztms rm_zdfy rmrsq / selection=rsquare b best=1;
MPM11: model rp=rm rm_pred_mean rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*11*/
MCFG1: model rp=rm zdy ztbl ztms zdfy rm_zdy rm_ztbl rm_ztms rm_zdfy / selection=rsquare b best=1;
MPM20: model rp=pred_mean rm rm_pred_mean / selection=rsquare b best=1;
proc print data=est;
run;
proc reg data=modelization outest=est; /*12*/
MCFG2: model rp=rm zdy ztbl ztms zdfy rm_zdy rm_ztbl rm_ztms rm_zdfy rmrsq / selection=rsquare b best=1;
MPM21: model rp=pred_mean rm rm_pred_mean rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
I hope that you can help me to write the codes relative to my goals.
Thank you,
It's really not clear to me why you are performing the modeling this way, and why you are focused on the intercepts, as the intercept doesn't by itself indicate whether or not the model is a good fit or a bad fit compared to other models. So this part is a mystery to me. But as I said, I don't believe there is a test of just the intercepts of two models on the same data (except that you could create one using randomization or jackknife/bootstrap methods).
There is a lot of literature on the issue of selecting the best model using variable selection methods (you are using selection=Rsquared but there are other methods as well). There is also a lot of criticism of these methods that they don't work well in practice and they produce regression models with extremely high variances on the parameters and thus the possibility that the regression coefficients are waaaayyy off or even have the wrong sign. Then there's also partial least squares regression (PROC PLS in SAS) which has advantages and disadvantages compared to PROC REG, and I tend to prefer it, but it's an entirely different way of thinking.
This is a very huge topic and I would advise you to talk this over with an experienced data analyst/statistician at your company or at your university.
Or if you want to discuss it here, let's stick to discussing the idea and goal, without imposing specific SAS code and specific hypothesis tests at this time.
Thanks Paige for your answer.
In Finance, intercepts are called "alphas". "Alpha" is used in investing to describe a strategy's ability to beat the market, or it's "edge." Alpha is thus also often referred to as “excess return” or “abnormal rate of return,” (Source Investopedia).
I understand your point but my research director asked me either to use PROC REG or PROC MODEL.
Thanks for your advice. I will find out what a simulation in the form of a randomization test or a permutation test consists of.
@pmorel3 wrote:
In Finance, intercepts are called "alphas". "Alpha" is used in investing to describe a strategy's ability to beat the market, or it's "edge." Alpha is thus also often referred to as “excess return” or “abnormal rate of return,” (Source Investopedia).
I understand your point but my research director asked me either to use PROC REG or PROC MODEL.
Thanks for your advice. I will find out what a simulation in the form of a randomization test or a permutation test consists of.
I see your point about interpreting "alpha" or the intercept in this manner. I suppose this depends on the proper model being specified and fitting well. If the model is not properly specified, or doesn't fit well, then the alpha (and its interpretation) is suspect. In addition, there is the multi-collinearity issue which can cause the regression coefficients (including the intercept) to vary wildly depending on what variables are used in the model. So again, I have trouble assigning (in my mind) this interpretation to the intercept.
However, as far as I know, there is no way in PROC REG to statistically compare the intercepts from the two different models on the same data. I can't speak about PROC MODEL as I have never used it.
I've decided to make this finally, as a solution:
proc reg data=modelization outest=est; /*12*/
MCFG2: model rp=rm zdy ztbl ztms zdfy rm_zdy rm_ztbl rm_ztms rm_zdfy rmrsq / selection=rsquare b best=1;
MPM21: model rp=pred_mean rm rm_pred_mean rmrsq / selection=rsquare b best=1;
proc print data=est;
run;
data modelization;
set modelization;
intercept23 = rp + 0.27098*rm + 0.001889*zdy - 0.12634*ztbl - 0.23400*ztms - 0.38475*zdfy + 0.27095*rm_zdy + 6.88259*rm_ztbl + 3.89297*rm_ztms - 0.72923*rm_zdfy + 0.73268*rmrsq;
intercept24 = rp + 0.10546*rm + 1.68536*pred_mean + 6.90634*rm_pred_mean + 0.26742*rmrsq;
delta12 = intercept23-intercept24;
run;
proc univariate data=modelization;
var delta12;
run;
The trouble here is that I manually used each coefficient I compute before, which is so long for 2000 returns series of rp !!!
Hello Norman,
thanks for your answer. I've already seen this method but it describes only coefficients comparison in just one model.
My goal is to compare only the 2 intercepts (not the others bêtas coefficients) between 2 regression models.
@pmorel3 wrote:Hello everyone,
I'm a beginner in sas and I'm wondering a question.
After my 2 regressions, I would like to know whether the difference between my 2 intercepts is statistically significant or not.
Could you please help me to write the appropriate code ?
All of my variables are numbers.
proc reg data=modelization outest=est; /*1*/ M1: model rp=rm / selection=rsquare b best=1; MFS1: model rp=rm rm_zdy rm_ztbl rm_ztms rm_zdfy / selection=rsquare b best=1; proc print data=est; run;
For nested models, you can compare model fit by calculating the difference in model fit (sums of squares) and appropriately standardizing that difference and use an F test. This will tell you if the larger model is, statistically speaking, explaining more of the variation of the outcome variable. See this paper:
https://www.stat.ncsu.edu/people/bloomfield/courses/st370/Slides/MandR-ch12-sec02-06.pdf
If you follow the paper, I think you should consider whether you can use a one-factor model to do the comparison of the intercept estimates.
I haven't explored how this would work but this may give you a starting point.
Just thinking of something. If all your model predictors are binary, then your estimate for the intercept in your models will simply be the average of your outcome variable (Y) among the records where all your binary predictors have a 0 value. At the end of they day, you need to compare the means between two probably overlapping groups, when comparing the intercepts beween two nested models. You, or others here, might think of a SAS procedure that can do this; as I was unable to think of a SAS procedure that let one construct separate means for overlapping groups and test their difference.
Best of luck.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.