Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- Help with the cross-sectional regression and GMM

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-30-2013 05:47 PM
(3226 views)

Hello,

I'm trying to run the next multivariate regression:

R_{t} = β_{0} + B*F_{t1} + C*F_{t2 }+ DF_{t3 }+ β^{L}L_{t }+ e_{t}

and then

E(R_{t}) = Bλ_{F1 }+ Cλ_{F2 }+ Dλ_{F3 }+ β^{L}λ_{L}

and I have the restriction:

λ_{F1} = E(F_{t1})

λ_{F2} = E(F_{t2})

λ_{F3} = E(F_{t3})

and because of that I have

β_{0 }= β^{L} *[λ_{L}-E(L_t)]

I'm trying to do this using generalized method of moments (GMM) but it did not work.

I have something like this one:

proc model data = datain;

parms labda B_0 B_1 B_2 B_3 B_4 B_5 B_6 B_7 B_8 B_9 B_10 ...

beta_0 beta_1 beta_2 beta_3 beta_4 beta_5 beta_6 beta_7 beta_8 beta_9 beta_10 mean_factor;

eq.e1 = p_mretrf-**beta_0***(**labda-mean_factor**)-B**_0***F_1... -** beta_0***MDI **( if id = 0)** **........** ;

eq.e2=mdi*(p_mretrf-**beta_0***(**labda-mean_factor**)-**B_0***F_1... - **beta_0***MDI) **( if id = 0) ..........** ;

fit e1 e2 /gmm;

run;

And I want to add the same thing for** Id = 1,...,10. **

For a normal regression I would include "**by id;**". But the probleem here is that I just want to have 1 **labda **for all of them, but different **id** have different **beta's** and **B's (and C's and D's) **(that is why beta_0 to beta_10, I guess!!).

Can someone please help me with this?

Many thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Indexing for arrays actually starts at '1,' I'm sorry for sending you astray on that.

Also, there's a problem with expressing GMM models using the structure I recommended since all the moment conditions have been combined into one moment condition. If you can do a non-GMM (for instance OLS) estimation then just fixing the array indexing should suffice for you simplified problem. If GMM is necessary then you'll probably want to express the problem using as many moment conditions as there are cross sections in the data:

%macro geneqs(n);

%do i = 0 %to &n;

eq.resid&i = mom&i;

%end;

%mend geneqs;

proc model data=sasdata.mds_betas_portfolios_factors_mdi plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array mom[11] mom0-mom10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

do j = 0 to 10;

i = j+1;

mom* = (p_mretrf - bf *(labda+0.026987772)- bm*mkt_rf *

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

end;

%geneqs(10);

fit resid0-resid10 / gmm;

quit;

16 REPLIES 16

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If I understand your question properly you would like to do a multivariate regression on cross-sectional data where some paramters are shared across cross sections and others are not. Here's an example of how that can be done for a simple univariate linear model where the b1 and b2 paramters are specific to each cross section and the c paramter is shared among both cross sections:

data d;

call streaminit(1);

do i = 1 to 2;

do x = -10 to 10;

y = 3*i*x + 1 + rand('normal');

output;

end;

end;

run;

proc model data=d plot=none;

parms b1 b2 c;

resid1 = y - ( b1*x + c);

resid2 = y - (2*b2*x + c);

eq.one = resid1*(i=1)

+ resid2*(i=2);

fit one;

quit;

Please let me know if this doesn't answer the crux of your question, or if you'd like a clarification of how this technique for working with cross-sectional data can (or cannot) be generalized to you multivariate GMM problem.

Marc Kessler

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Marc,

Lots of thanks for your help. That is exactly my problems.

I have to do a multivariate regression on cross-sectional data.

- First step is running a time-series regressions; so different betas_i for different id.

- Second step is running a cross-sectional regression at each time t, get λ_{t }and taking the average of the estimate; so λ = mean (λ_t).

But I don't know how to do this. I appreciate it a lot if you can help me with a clarification of this technique for multivariate GMM

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here'a an expansion of the last example which estimates parameters in a multivariate model with cross-sectional data. It imposes a constraint on one of the cross-sectional equation's paramters based on the paramters estimated in the time equation(s):

data d;

call streaminit(1);

do cs = 1 to 3;

do t = 1 to 10;

y = 3*t + cs*rand('normal') + 1 + rand('normal');

output;

end;

end;

run;

proc model data=d plot=none;

array b[10] b1-b10;

parms b1-b10 c bavg;

tdim = 0;

do i = 1 to 10;

tdim = tdim + (y - b*)*(t=i); end; cdim = 0; do i = 1 to 3; cdim = cdim + (y - c*cs - bavg)*(cs=i); end;*

eq.tresid = tdim;

eq.csresid = cdim;

restrict bavg = (b1 + b2 + b3 + b4 + b5 + b6 + b7 + b8 + b9 + b10)/10;

fit tresid csresid;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks for your help.

I have modify the code a bit for my sample. But I think that I have done something wrong. The program just keeps running and did not give any results yet.

And can I impose more moment conditions in the same way? I have add my data set above.

proc model data=sasdata.mds_betas_portfolios_factors plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array labda[324] labda1-labda324;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda1-labda324 lavg;

* time-series regression "by preranking_factor_beta" ;

tdim = 0;

do i = 0 to 10;

tdim = tdim + (p_mretrf - bf**(lavg+0.026987772)- bm *mkt_rf - bs*smb - bh*hml - bf*mdi)*(t=preranking_factor_beta);*

end;

* cross-sectional regression "**by caldt**";

cdim = 0;

do i = **caldt.first to caldt.last; * this one is not working. I think that I have to make extra variable that run from 1-324 for the number of date? ; **

do j = 0 to 10;

cdim = cdim + (p_mretrf - bm*)* (caldt=i);*

end;

end;

eq.tresid = tdim;

eq.csresid = cdim;

restrict lavg = **mean(labda1-labda324);** *** I'm not sure that this one will work like this??;**

fit tresid csresid / gmm;

quit;

....

in the end...

I get the note e.g.:

**NOTE: The parameter bm4 is shared by all 2 of the equations to be estimated.**

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A couple of things I noticed with your implementation are:

- in the tdim loop the last factor is "(t=preranking_factor_beta)", it should probably be "(i=preranking_factor_beta)"
- you are correct that the "caldt.last" syntax will not work and that adding an extra variable is a good work-around for representing separate model equqations at each time step
- for the inner "j" loop you probably want to add the additional factor "(j=preraninking_factor)" to prevent extra terms to be included in this equation
- one shorthand for expressing the restriction is "restrict lavg = sum(of labda1-labda324)/324;"
- it might be helpful to mock up a smaller data set with known parameters, and fewer variables and observations (both in time and cross-section) to get a clearer understanding of how you would represent the structure of this model using PROC MODEL

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your answer!

After adjustments and lots of other trial and errors. It still gave me the error:

ERROR: The number of parameters (371) is greater than the number of unique instruments (18) times the number of equations (2).

and still the note: NOTE: The parameter bm0 is shared by all 2 of the equations to be estimated

I also try it with this code below where I include labda in the formula already (without estimating 324 labda_t and then take their average, like in first attempt). But I still have more parameters than unique instruments.

How can the ERROR and the NOTE be helped?

proc model data=sasdata.mds_betas_portfolios_factors_rd;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

* time-series regression "by preranking_factor_beta" ;

mom1 = 0;

do i = 0 to 10;

mom1 = mom1 + (p_mretrf - bf**(labda+0.026987772)- bm *mkt_rf - bs*smb - bh*hml - bf*mdi)*(i= preranking_factor_beta);*

end;

eq.resid1 = mom1;

instruments mkt_rf smb hml mdi;

fit resid1/gmm;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Indexing for arrays actually starts at '1,' I'm sorry for sending you astray on that.

Also, there's a problem with expressing GMM models using the structure I recommended since all the moment conditions have been combined into one moment condition. If you can do a non-GMM (for instance OLS) estimation then just fixing the array indexing should suffice for you simplified problem. If GMM is necessary then you'll probably want to express the problem using as many moment conditions as there are cross sections in the data:

%macro geneqs(n);

%do i = 0 %to &n;

eq.resid&i = mom&i;

%end;

%mend geneqs;

proc model data=sasdata.mds_betas_portfolios_factors_mdi plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array mom[11] mom0-mom10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

do j = 0 to 10;

i = j+1;

mom* = (p_mretrf - bf *(labda+0.026987772)- bm*mkt_rf *

- bs*smb - bh*hml - bf*mdi)*(j=preranking_factor_beta);

end;

%geneqs(10);

fit resid0-resid10 / gmm;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much for your help and your time.

Now I have two proc models one for the OLS and one for GMM. But they both give me different estimators, which I did not expect. I thought that they would give exactly the same estimators, except for the standard errors. And can I also add the **weight** which is the inverse of the of the variance of the errors?

For the OLS:

proc model data=sasdata.mds_betas_portfolios_factors1;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

* time-series regression "by preranking_factor_beta" ;

mom1 = 0;

do i = 1 to 11;

mom1 = mom1 + (p_mretrf - bf**(labda+0.026987772)- bm *mkt_rf - bs*smb - bh*hml - bf*mdi)*(i= preranking_factor_beta+1);*

end;

eq.resid1 = mom1;

fit resid1;

quit;

And for the GMM:

proc model data=sasdata.mds_betas_portfolios_factors1 plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array mom[11] mom0-mom10;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

do j = 0 to 10;

i = j+1;

mom* = (p_mretrf - bf *(labda+0.026987772) - bm*mkt_rf*

- bs**smb - bh *hml - bf*mdi)*(j=preranking_factor_beta);*

end;

%geneqs(10);

fit resid0-resid10 / gmm;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PROC MODEL provides several different methods for determining parameter estimates and their standard errors. Their properties are described here: http://support.sas.com/documentation/cdl/en/etsug/65545/PDF/default/etsug.pdf.

You could use either the WEIGHT statement or the ITOLS method to weight the observations as you requested.

It is typically not necessary to express the model program differently to use the various methods, and for comparison purposes you may find it easier to use the GMM version of your model program for both the OLS ans GMM methods.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your advice!

I have used the GMM version of my model program for both the OLS ans GMM methods. For the OLS method I get the same estimations as of the OLS version, but **different** standard errors ( the standard errors of the GMM version look a lot better). When using GMM method the answers is still different than that of the OLS, which I still don't understand.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks for your help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm sorry to bother you again. Can you help me with adding the weighting matrix to this, that is the inverse covariance matrix of the errors ? Also because when adding more moments condition the solutions diverge.

proc model data=sasdata.mds_betas_portfolios_factors1 plot=none;

array bm[11] bm0-bm10;

array bs[11] bs0-bs10;

array bh[11] bh0-bh10;

array bf[11] bf0-bf10;

array mom[66] mom0-mom65;

parms bm0-bm10 bs0-bs10 bh0-bh10 bf0-bf10 labda;

do j = 0 to 10;

i = j+1;

mom* = (p_mretrf - bf *(labda+0.026987772) - bm*mkt_rf*

- bs**smb - bh *hml - bf*mdi)*(j=preranking_factor_beta);*

mom[i+11]= mkt_rf*(p_mretrf - bf**(labda+0.026987772) - bm *mkt_rf*

- bs**smb - bh *hml - bf*mdi)*(j=preranking_factor_beta);*

mom[i+22]= smb*(p_mretrf - bf**(labda+0.026987772) - bm *mkt_rf*

- bs**smb - bh *hml - bf*mdi)*(j=preranking_factor_beta);*

mom[i+33]= hml*(p_mretrf - bf**(labda+0.026987772) - bm *mkt_rf*

- bs**smb - bh *hml - bf*mdi)*(j=preranking_factor_beta);*

mom[i+44]= mdi*(p_mretrf - bf**(labda+0.026987772) - bm *mkt_rf*

- bs**smb - bh *hml - bf*mdi)*(j=preranking_factor_beta);*

mom[i+55]= (mdi-0.026987772)*(j=preranking_factor_beta);

end;

%geneqs(65);

fit resid0-resid65/gmm ;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The GMM method in PROC MODEL uses a weighting matrix, V, in the estimation of the parameters. There's an explanation of the options for specifying the V matrix in the "Details: Estimation by the MODEL Procedure" section of the MODEL documentation.

Regarding your divergence problem there can be many causes for this. I'd recommend trying to isolate the moment conditions, or observations causing the divergence to get a better understanding of the problem.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.