BookmarkSubscribeRSS Feed
Princeelvisa
Obsidian | Level 7

Please I have 8631 unique mutual funds, as a result of their different risk exposure, I run regression per fund, outputting their parameter estimates, but at the end of the day, I want to have one estimate for all and a single t value.

But for the coefficients of these 8631 funds, I take an average of them to serve as a single coefficient (I'm not too sure if this is right). for the t values, it will be wrong to just use an average of all the t values of the 8631 funds. I need help to have to find just a single coefficient and t value for these 8631 funds, even though I am running the regression by fund. Thank you. attached is what I have..

ods listing close;
ods noresults;
ods output parameterestimates=prince.coefew1;
proc reg data=prince.Allfund;
by CRSP_FUNDNO;
model MRETRF=mktrf smb hml umd;
run; 

data prince.betaestew;
set prince.Coefew1;
if variable = 'mktrf';
varerr=stderr**2;
rename estimate=betaestew;
keep CRSP_FUNDNO Variable Estimate StdErr varerr tvalue;
run;
11 REPLIES 11
Reeza
Super User

That doesn't seem correct to me. Why not remove the BY statement and run that regression model?

 

proc reg data=prince.Allfund;

model MRETRF=mktrf smb hml umd;
run; 

If you want to account for the different funds you could include that as a variable though it may not produce what you want.

Princeelvisa
Obsidian | Level 7

I cannot remove the by statement because, each fund has different risk exposure, then the need and correct way is to run by fund no. 

Reeza
Super User

@Princeelvisa wrote:

I cannot remove the by statement because, each fund has different risk exposure, then the need and correct way is to run by fund no. 


Then you'll get estimates for each fund, if each has its own risk exposure then why do you want an overall estimate?  The average of the estimates will not be the overall risk. 

 

 

Princeelvisa
Obsidian | Level 7

I want the overall because I'm studying the overall, by the regression needs to be run by fund, before ending up in the overall. thanks

PaigeMiller
Diamond | Level 26

@Princeelvisa wrote:

I want the overall because I'm studying the overall, by the regression needs to be run by fund, before ending up in the overall. thanks


Using a BY statement is not the way to get an overall regression. I'm not sure why you think a BY statement is needed here. Please explain in more detail.

--
Paige Miller
Princeelvisa
Obsidian | Level 7

Thank so much, using a by statement I intend to run the regression by each fund to obtain their respective estimates in a new dataset, the by statement run the regression for individual fund as a result of each fund having different risk exposure therefore the need to use the by statement. I heard I use "loop'' to aid in running the regressions. By my major concern is, after keeping the estimates in a separate dataset, I fund the average of the parameter estimates to serve for the whole, but doing the same by averaging the t values to obtain a single number for the whole I thing will be inappropriate then how do I get a single t value for the whole after running the regression by each fund? Thanks

PaigeMiller
Diamond | Level 26

@Princeelvisa wrote:

Thank so much, using a by statement I intend to run the regression by each fund to obtain their respective estimates in a new dataset, the by statement run the regression for individual fund as a result of each fund having different risk exposure therefore the need to use the by statement. I heard I use "loop'' to aid in running the regressions. By my major concern is, after keeping the estimates in a separate dataset, I fund the average of the parameter estimates to serve for the whole, but doing the same by averaging the t values to obtain a single number for the whole I thing will be inappropriate then how do I get a single t value for the whole after running the regression by each fund? Thanks


I would not recommend this.

 

The average of the slopes is not a way to get a good "overall" slope. Same thing applies to t-values.

 

There's no reason you can't do both -- run individual regressions with the BY statement to get estimates for each fund, and then run the regression without the BY statement to get the overall slope and t-values.

--
Paige Miller
Princeelvisa
Obsidian | Level 7

Capture.PNGthis is the result of not running by the "by statement" the t values look weird to me

PaigeMiller
Diamond | Level 26

Weird? In what way? State what is weird about it.

 

Lots of people have used SAS PROC REG for decades, and I am not aware of any previous claims of incorrect t-value being computed by PROC REG.

--
Paige Miller
mkeintz
PROC Star

The high value of t for the mktrf factor (which I presume is overall market-return minus risk-free-return, probably determined as sp500 return minus T-bill return) when you pool all the mutual funds simply says that the association of the "average" mutual fund is undeniably associated with mktrf.

 

And the parameter value (.95....) says that the class of portfolios known as mutual funds track the market very nearly on a 1:1 basis.  What is surprising about either of these numbers?  If effectively states that the risk premium for mutual funds is related to the risk premium for the overall market.  Presumably your sample of mutual funds are mostly invested in offerings in the self-same market.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

@Princeelvisa wrote:

Capture.PNGthis is the result of not running by the "by statement" the t values look weird to me


Did you standardize your variables before regression?

 

Also, one possibilty. Cluster your data with respect to the mutual funds and reduce your dimensionality of the stocks to clusters, so you reduce the 8631 factors to say 10 or 20 and then use that as a factor in your analysis. I'm also assuming there's some time component to this data so you may need to be working with time series regression models. Otherwise, if you have one point for each mutual fund you definitely cannot use the BY statement. 

 

Your model would end up as:

 

proc glm data=stocks;
class cluster;
model dependent = cluster mktrf smb hmm umd stkmv stkmvew;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1989 views
  • 4 likes
  • 4 in conversation