Please I have 8631 unique mutual funds, as a result of their different risk exposure, I run regression per fund, outputting their parameter estimates, but at the end of the day, I want to have one estimate for all and a single t value.
But for the coefficients of these 8631 funds, I take an average of them to serve as a single coefficient (I'm not too sure if this is right). for the t values, it will be wrong to just use an average of all the t values of the 8631 funds. I need help to have to find just a single coefficient and t value for these 8631 funds, even though I am running the regression by fund. Thank you. attached is what I have..
ods listing close;
ods noresults;
ods output parameterestimates=prince.coefew1;
proc reg data=prince.Allfund;
by CRSP_FUNDNO;
model MRETRF=mktrf smb hml umd;
run;
data prince.betaestew;
set prince.Coefew1;
if variable = 'mktrf';
varerr=stderr**2;
rename estimate=betaestew;
keep CRSP_FUNDNO Variable Estimate StdErr varerr tvalue;
run;
That doesn't seem correct to me. Why not remove the BY statement and run that regression model?
proc reg data=prince.Allfund;
model MRETRF=mktrf smb hml umd;
run;
If you want to account for the different funds you could include that as a variable though it may not produce what you want.
I cannot remove the by statement because, each fund has different risk exposure, then the need and correct way is to run by fund no.
@Princeelvisa wrote:
I cannot remove the by statement because, each fund has different risk exposure, then the need and correct way is to run by fund no.
Then you'll get estimates for each fund, if each has its own risk exposure then why do you want an overall estimate? The average of the estimates will not be the overall risk.
I want the overall because I'm studying the overall, by the regression needs to be run by fund, before ending up in the overall. thanks
@Princeelvisa wrote:
I want the overall because I'm studying the overall, by the regression needs to be run by fund, before ending up in the overall. thanks
Using a BY statement is not the way to get an overall regression. I'm not sure why you think a BY statement is needed here. Please explain in more detail.
Thank so much, using a by statement I intend to run the regression by each fund to obtain their respective estimates in a new dataset, the by statement run the regression for individual fund as a result of each fund having different risk exposure therefore the need to use the by statement. I heard I use "loop'' to aid in running the regressions. By my major concern is, after keeping the estimates in a separate dataset, I fund the average of the parameter estimates to serve for the whole, but doing the same by averaging the t values to obtain a single number for the whole I thing will be inappropriate then how do I get a single t value for the whole after running the regression by each fund? Thanks
@Princeelvisa wrote:
Thank so much, using a by statement I intend to run the regression by each fund to obtain their respective estimates in a new dataset, the by statement run the regression for individual fund as a result of each fund having different risk exposure therefore the need to use the by statement. I heard I use "loop'' to aid in running the regressions. By my major concern is, after keeping the estimates in a separate dataset, I fund the average of the parameter estimates to serve for the whole, but doing the same by averaging the t values to obtain a single number for the whole I thing will be inappropriate then how do I get a single t value for the whole after running the regression by each fund? Thanks
I would not recommend this.
The average of the slopes is not a way to get a good "overall" slope. Same thing applies to t-values.
There's no reason you can't do both -- run individual regressions with the BY statement to get estimates for each fund, and then run the regression without the BY statement to get the overall slope and t-values.
this is the result of not running by the "by statement" the t values look weird to me
Weird? In what way? State what is weird about it.
Lots of people have used SAS PROC REG for decades, and I am not aware of any previous claims of incorrect t-value being computed by PROC REG.
The high value of t for the mktrf factor (which I presume is overall market-return minus risk-free-return, probably determined as sp500 return minus T-bill return) when you pool all the mutual funds simply says that the association of the "average" mutual fund is undeniably associated with mktrf.
And the parameter value (.95....) says that the class of portfolios known as mutual funds track the market very nearly on a 1:1 basis. What is surprising about either of these numbers? If effectively states that the risk premium for mutual funds is related to the risk premium for the overall market. Presumably your sample of mutual funds are mostly invested in offerings in the self-same market.
@Princeelvisa wrote:
this is the result of not running by the "by statement" the t values look weird to me
Did you standardize your variables before regression?
Also, one possibilty. Cluster your data with respect to the mutual funds and reduce your dimensionality of the stocks to clusters, so you reduce the 8631 factors to say 10 or 20 and then use that as a factor in your analysis. I'm also assuming there's some time component to this data so you may need to be working with time series regression models. Otherwise, if you have one point for each mutual fund you definitely cannot use the BY statement.
Your model would end up as:
proc glm data=stocks;
class cluster;
model dependent = cluster mktrf smb hmm umd stkmv stkmvew;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.