Re: Mutual Fund Regression

Princeelvisa · Posted 02-16-2018 09:35 AM

Please I have 8631 unique mutual funds, as a result of their different risk exposure, I run regression per fund, outputting their parameter estimates, but at the end of the day, I want to have one estimate for all and a single t value.

But for the coefficients of these 8631 funds, I take an average of them to serve as a single coefficient (I'm not too sure if this is right). for the t values, it will be wrong to just use an average of all the t values of the 8631 funds. I need help to have to find just a single coefficient and t value for these 8631 funds, even though I am running the regression by fund. Thank you. attached is what I have..

ods listing close;
ods noresults;
ods output parameterestimates=prince.coefew1;
proc reg data=prince.Allfund;
by CRSP_FUNDNO;
model MRETRF=mktrf smb hml umd;
run; 

data prince.betaestew;
set prince.Coefew1;
if variable = 'mktrf';
varerr=stderr**2;
rename estimate=betaestew;
keep CRSP_FUNDNO Variable Estimate StdErr varerr tvalue;
run;

Reeza · Posted 02-16-2018 11:05 AM

That doesn't seem correct to me. Why not remove the BY statement and run that regression model?

proc reg data=prince.Allfund;

model MRETRF=mktrf smb hml umd;
run;

If you want to account for the different funds you could include that as a variable though it may not produce what you want.

Princeelvisa · Posted 02-16-2018 11:10 AM

I cannot remove the by statement because, each fund has different risk exposure, then the need and correct way is to run by fund no.

Reeza · Posted 02-16-2018 11:13 AM

@Princeelvisa wrote:

I cannot remove the by statement because, each fund has different risk exposure, then the need and correct way is to run by fund no.

Then you'll get estimates for each fund, if each has its own risk exposure then why do you want an overall estimate? The average of the estimates will not be the overall risk.

Princeelvisa · Posted 02-16-2018 11:19 AM

I want the overall because I'm studying the overall, by the regression needs to be run by fund, before ending up in the overall. thanks

PaigeMiller · Posted 02-16-2018 11:42 AM

@Princeelvisa wrote:

I want the overall because I'm studying the overall, by the regression needs to be run by fund, before ending up in the overall. thanks

Using a BY statement is not the way to get an overall regression. I'm not sure why you think a BY statement is needed here. Please explain in more detail.

--
Paige Miller

Princeelvisa · Posted 02-17-2018 09:39 AM

Thank so much, using a by statement I intend to run the regression by each fund to obtain their respective estimates in a new dataset, the by statement run the regression for individual fund as a result of each fund having different risk exposure therefore the need to use the by statement. I heard I use "loop'' to aid in running the regressions. By my major concern is, after keeping the estimates in a separate dataset, I fund the average of the parameter estimates to serve for the whole, but doing the same by averaging the t values to obtain a single number for the whole I thing will be inappropriate then how do I get a single t value for the whole after running the regression by each fund? Thanks

PaigeMiller · Posted 02-17-2018 09:54 AM

@Princeelvisa wrote:

Thank so much, using a by statement I intend to run the regression by each fund to obtain their respective estimates in a new dataset, the by statement run the regression for individual fund as a result of each fund having different risk exposure therefore the need to use the by statement. I heard I use "loop'' to aid in running the regressions. By my major concern is, after keeping the estimates in a separate dataset, I fund the average of the parameter estimates to serve for the whole, but doing the same by averaging the t values to obtain a single number for the whole I thing will be inappropriate then how do I get a single t value for the whole after running the regression by each fund? Thanks

I would not recommend this.

The average of the slopes is not a way to get a good "overall" slope. Same thing applies to t-values.

There's no reason you can't do both -- run individual regressions with the BY statement to get estimates for each fund, and then run the regression without the BY statement to get the overall slope and t-values.

--
Paige Miller

Princeelvisa · Posted 02-17-2018 10:06 AM

this is the result of not running by the "by statement" the t values look weird to me

PaigeMiller · Posted 02-17-2018 10:16 AM

Weird? In what way? State what is weird about it.

Lots of people have used SAS PROC REG for decades, and I am not aware of any previous claims of incorrect t-value being computed by PROC REG.

--
Paige Miller

mkeintz · Posted 02-17-2018 03:14 PM

The high value of t for the mktrf factor (which I presume is overall market-return minus risk-free-return, probably determined as sp500 return minus T-bill return) when you pool all the mutual funds simply says that the association of the "average" mutual fund is undeniably associated with mktrf.

And the parameter value (.95....) says that the class of portfolios known as mutual funds track the market very nearly on a 1:1 basis. What is surprising about either of these numbers? If effectively states that the risk premium for mutual funds is related to the risk premium for the overall market. Presumably your sample of mutual funds are mostly invested in offerings in the self-same market.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Reeza · Posted 02-17-2018 04:25 PM

@Princeelvisa wrote:

this is the result of not running by the "by statement" the t values look weird to me

Did you standardize your variables before regression?

Also, one possibilty. Cluster your data with respect to the mutual funds and reduce your dimensionality of the stocks to clusters, so you reduce the 8631 factors to say 10 or 20 and then use that as a factor in your analysis. I'm also assuming there's some time component to this data so you may need to be working with time series regression models. Otherwise, if you have one point for each mutual fund you definitely cannot use the BY statement.

Your model would end up as:

proc glm data=stocks;
class cluster;
model dependent = cluster mktrf smb hmm umd stkmv stkmvew;
run;

SAS Innovate 2025: Save the Date