Solved: Re: Help with the cross-sectional regression and GMM

thdang · Posted 05-30-2013 05:47 PM

Hello,

I'm trying to run the next multivariate regression:

R_t = β₀ + B*F_t1 + C*F_t2+ DF_t3+ β^LL_t+ e_t

and then

E(R_t) = Bλ_F1+ Cλ_F2+ Dλ_F3+ β^Lλ_L

and I have the restriction:

λ_F1 = E(F_t1)

λ_F2 = E(F_t2)

λ_F3 = E(F_t3)

and because of that I have

β₀= β^L *[λ_L-E(L_t)]

I'm trying to do this using generalized method of moments (GMM) but it did not work.

I have something like this one:

proc model data = datain;

parms labda B_0 B_1 B_2 B_3 B_4 B_5 B_6 B_7 B_8 B_9 B_10 ...

beta_0 beta_1 beta_2 beta_3 beta_4 beta_5 beta_6 beta_7 beta_8 beta_9 beta_10 mean_factor;

eq.e1 = p_mretrf-beta_0*(labda-mean_factor)-B_0*F_1... - beta_0*MDI ( if id = 0) ........ ;

eq.e2=mdi*(p_mretrf-beta_0*(labda-mean_factor)-B_0*F_1... - beta_0*MDI) ( if id = 0) .......... ;

fit e1 e2 /gmm;

run;

And I want to add the same thing for Id = 1,...,10.

For a normal regression I would include "by id;". But the probleem here is that I just want to have 1 labda for all of them, but different id have different beta's and B's (and C's and D's) (that is why beta_0 to beta_10, I guess!!).

Can someone please help me with this?

Many thanks in advance!

kessler · Posted 06-03-2013 08:19 PM

Indexing for arrays actually starts at '1,' I'm sorry for sending you astray on that.

Also, there's a problem with expressing GMM models using the structure I recommended since all the moment conditions have been combined into one moment condition. If you can do a non-GMM (for instance OLS) estimation then just fixing the array indexing should suffice for you simplified problem. If GMM is necessary then you'll probably want to express the problem using as many moment conditions as there are cross sections in the data:

%macro geneqs(n);
   %do i = 0 %to &n;
      eq.resid&i = mom&i;
   %end;
%mend geneqs;

proc model data=sasdata.mds_betas_portfolios_factors_mdi plot=none;
   array bm[11] bm0-bm10;
   array bs[11] bs0-bs10;
   array bh[11] bh0-bh10;
   array bf[11] bf0-bf10;
   array mom[11] mom0-mom10;
   parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

   do j = 0 to 10;
      i = j+1;
      mom = (p_mretrf - bf*(labda+0.026987772)- bm*mkt_rf
                      - bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);
   end;

   %geneqs(10);

fit resid0-resid10 / gmm;
quit;

View solution in original post

kessler · Posted 06-03-2013 09:42 AM

If I understand your question properly you would like to do a multivariate regression on cross-sectional data where some paramters are shared across cross sections and others are not. Here's an example of how that can be done for a simple univariate linear model where the b1 and b2 paramters are specific to each cross section and the c paramter is shared among both cross sections:

data d;
   call streaminit(1);
   do i = 1 to 2;
      do x = -10 to 10;
         y = 3*i*x + 1 + rand('normal');
         output;
      end;
   end;
run;

proc model data=d plot=none;
parms b1 b2 c;

resid1 = y - ( b1*x + c);
resid2 = y - (2*b2*x + c);

eq.one = resid1*(i=1)
+ resid2*(i=2);

fit one;
quit;

Please let me know if this doesn't answer the crux of your question, or if you'd like a clarification of how this technique for working with cross-sectional data can (or cannot) be generalized to you multivariate GMM problem.

Marc Kessler

thdang · Posted 06-03-2013 10:04 AM

Hi Marc,

Lots of thanks for your help. That is exactly my problems.

I have to do a multivariate regression on cross-sectional data.

- First step is running a time-series regressions; so different betas_i for different id.

- Second step is running a cross-sectional regression at each time t, get λ_tand taking the average of the estimate; so λ = mean (λ_t).

But I don't know how to do this. I appreciate it a lot if you can help me with a clarification of this technique for multivariate GMM

kessler · Posted 06-03-2013 12:04 PM

Here'a an expansion of the last example which estimates parameters in a multivariate model with cross-sectional data. It imposes a constraint on one of the cross-sectional equation's paramters based on the paramters estimated in the time equation(s):

data d;
   call streaminit(1);
   do cs = 1 to 3;
      do t = 1 to 10;
         y = 3*t + cs*rand('normal') + 1 + rand('normal');
         output;
      end;
   end;
run;

proc model data=d plot=none;
array b[10] b1-b10;
parms b1-b10 c bavg;

   tdim = 0;
   do i = 1 to 10;
      tdim = tdim + (y - b)*(t=i);
   end;
   cdim = 0;
   do i = 1 to 3;
      cdim = cdim + (y - c*cs - bavg)*(cs=i);
   end;

eq.tresid = tdim;
eq.csresid = cdim;

restrict bavg = (b1 + b2 + b3 + b4 + b5 + b6 + b7 + b8 + b9 + b10)/10;

fit tresid csresid;
quit;

thdang · Posted 06-03-2013 04:53 PM

Many thanks for your help.

I have modify the code a bit for my sample. But I think that I have done something wrong. The program just keeps running and did not give any results yet.

And can I impose more moment conditions in the same way? I have add my data set above.

proc model data=sasdata.mds_betas_portfolios_factors plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array labda[324] labda1-labda324;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda1-labda324 lavg;

* time-series regression "by preranking_factor_beta" ;

tdim = 0;

do i = 0 to 10;

tdim = tdim + (p_mretrf - bf*(lavg+0.026987772)- bm*mkt_rf - bs*smb - bh*hml - bf*mdi)*(t=preranking_factor_beta);

end;

* cross-sectional regression "by caldt";

cdim = 0;

do i = caldt.first to caldt.last; * this one is not working. I think that I have to make extra variable that run from 1-324 for the number of date? ;

do j = 0 to 10;

cdim = cdim + (p_mretrf - bm*mkt_rf - bs*smb - bh*hml - bf*labda)*(caldt=i);

end;

eq.tresid = tdim;

eq.csresid = cdim;

restrict lavg = mean(labda1-labda324); * I'm not sure that this one will work like this??;

fit tresid csresid / gmm;

quit;

....

in the end...

I get the note e.g.:

NOTE: The parameter bm4 is shared by all 2 of the equations to be estimated.

kessler · Posted 06-03-2013 06:12 PM

A couple of things I noticed with your implementation are:

in the tdim loop the last factor is "(t=preranking_factor_beta)", it should probably be "(i=preranking_factor_beta)"
you are correct that the "caldt.last" syntax will not work and that adding an extra variable is a good work-around for representing separate model equqations at each time step
for the inner "j" loop you probably want to add the additional factor "(j=preraninking_factor)" to prevent extra terms to be included in this equation
one shorthand for expressing the restriction is "restrict lavg = sum(of labda1-labda324)/324;"
it might be helpful to mock up a smaller data set with known parameters, and fewer variables and observations (both in time and cross-section) to get a clearer understanding of how you would represent the structure of this model using PROC MODEL

thdang · Posted 06-03-2013 06:45 PM

Thanks for your answer!

After adjustments and lots of other trial and errors. It still gave me the error:

ERROR: The number of parameters (371) is greater than the number of unique instruments (18) times the number of equations (2).

and still the note: NOTE: The parameter bm0 is shared by all 2 of the equations to be estimated

I also try it with this code below where I include labda in the formula already (without estimating 324 labda_t and then take their average, like in first attempt). But I still have more parameters than unique instruments.

How can the ERROR and the NOTE be helped?

proc model data=sasdata.mds_betas_portfolios_factors_rd;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

* time-series regression "by preranking_factor_beta" ;

mom1 = 0;

do i = 0 to 10;

mom1 = mom1 + (p_mretrf - bf*(labda+0.026987772)- bm*mkt_rf - bs*smb - bh*hml - bf*mdi)*(i= preranking_factor_beta);

end;

eq.resid1 = mom1;

instruments mkt_rf smb hml mdi;

fit resid1/gmm;

quit;

kessler · Posted 06-03-2013 08:19 PM

Indexing for arrays actually starts at '1,' I'm sorry for sending you astray on that.

Also, there's a problem with expressing GMM models using the structure I recommended since all the moment conditions have been combined into one moment condition. If you can do a non-GMM (for instance OLS) estimation then just fixing the array indexing should suffice for you simplified problem. If GMM is necessary then you'll probably want to express the problem using as many moment conditions as there are cross sections in the data:

%macro geneqs(n);
   %do i = 0 %to &n;
      eq.resid&i = mom&i;
   %end;
%mend geneqs;

proc model data=sasdata.mds_betas_portfolios_factors_mdi plot=none;
   array bm[11] bm0-bm10;
   array bs[11] bs0-bs10;
   array bh[11] bh0-bh10;
   array bf[11] bf0-bf10;
   array mom[11] mom0-mom10;
   parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

   do j = 0 to 10;
      i = j+1;
      mom = (p_mretrf - bf*(labda+0.026987772)- bm*mkt_rf
                      - bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);
   end;

   %geneqs(10);

fit resid0-resid10 / gmm;
quit;

thdang · Posted 06-04-2013 06:41 AM

Thank you very much for your help and your time.

Now I have two proc models one for the OLS and one for GMM. But they both give me different estimators, which I did not expect. I thought that they would give exactly the same estimators, except for the standard errors. And can I also add the weight which is the inverse of the of the variance of the errors?

For the OLS:

proc model data=sasdata.mds_betas_portfolios_factors1;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

* time-series regression "by preranking_factor_beta" ;

mom1 = 0;

do i = 1 to 11;

mom1 = mom1 + (p_mretrf - bf*(labda+0.026987772)- bm*mkt_rf - bs*smb - bh*hml - bf*mdi)*(i= preranking_factor_beta+1);

end;

eq.resid1 = mom1;

fit resid1;

quit;

And for the GMM:

proc model data=sasdata.mds_betas_portfolios_factors1 plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array mom[11] mom0-mom10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

do j = 0 to 10;

i = j+1;

mom = (p_mretrf - bf*(labda+0.026987772) - bm*mkt_rf

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

end;

%geneqs(10);

fit resid0-resid10 / gmm;

quit;

kessler · Posted 06-04-2013 09:04 AM

PROC MODEL provides several different methods for determining parameter estimates and their standard errors. Their properties are described here: http://support.sas.com/documentation/cdl/en/etsug/65545/PDF/default/etsug.pdf.

You could use either the WEIGHT statement or the ITOLS method to weight the observations as you requested.

It is typically not necessary to express the model program differently to use the various methods, and for comparison purposes you may find it easier to use the GMM version of your model program for both the OLS ans GMM methods.

thdang · Posted 06-04-2013 11:42 AM

Thanks for your advice!

I have used the GMM version of my model program for both the OLS ans GMM methods. For the OLS method I get the same estimations as of the OLS version, but different standard errors ( the standard errors of the GMM version look a lot better). When using GMM method the answers is still different than that of the OLS, which I still don't understand.

kessler · Posted 06-04-2013 12:27 PM

The OLS method finds parameters which minimize the sum of the squares of the moment conditions (equations) in your model program. The GMM method minimizes a more complicated function involving your moment conditions, instrumental variables, and the autocorrelation of residuals. For some simple models the same set of parameters minimizes both of these problems; however, this is not the case for your problem.

thdang · Posted 06-05-2013 01:50 AM

Many thanks for your help!

thdang · Posted 06-05-2013 06:28 AM

I'm sorry to bother you again. Can you help me with adding the weighting matrix to this, that is the inverse covariance matrix of the errors ? Also because when adding more moments condition the solutions diverge.

proc model data=sasdata.mds_betas_portfolios_factors1 plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array mom[66] mom0-mom65;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

do j = 0 to 10;

i = j+1;

mom = (p_mretrf - bf*(labda+0.026987772) - bm*mkt_rf

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

mom[i+11]= mkt_rf*(p_mretrf - bf*(labda+0.026987772) - bm*mkt_rf

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

mom[i+22]= smb*(p_mretrf - bf*(labda+0.026987772) - bm*mkt_rf

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

mom[i+33]= hml*(p_mretrf - bf*(labda+0.026987772) - bm*mkt_rf

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

mom[i+44]= mdi*(p_mretrf - bf*(labda+0.026987772) - bm*mkt_rf

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

mom[i+55]= (mdi-0.026987772)*(j=preranking_factor_beta);

end;

%geneqs(65);

fit resid0-resid65/gmm ;

quit;

kessler · Posted 06-05-2013 10:47 AM

The GMM method in PROC MODEL uses a weighting matrix, V, in the estimation of the parameters. There's an explanation of the options for specifying the V matrix in the "Details: Estimation by the MODEL Procedure" section of the MODEL documentation.

Regarding your divergence problem there can be many causes for this. I'd recommend trying to isolate the moment conditions, or observations causing the divergence to get a better understanding of the problem.

SAS Innovate 2025: Save the Date