- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to estimate three regressions
y1=a+b*X+u
y2=c+d*X+v
(y2-y1)=e+f*X+w
using PROC MODEL.
data have;
do t=1 to 50;
x=rannor(1);
y1=x+rannor(1);
y2=x+rannor(1);
y3=y2-y1;
output;
end;
run;
So I defined y3 as y2-y1 and tried to put this variable into the PROC MODEL as follows.
proc model;
y1=a+b*x;
y2=c+d*x;
y3=e+f*x;
fit y:/gmm kernel=(bart,2,0) vardef=n;
run;
and this change made a multicollinearity problem since y3 is a linear combination of y1 and y2.
19 proc model; 20 y1=a+b*x; 21 y2=c+d*x; 22 y3=e+f*x; 23 fit y:/gmm kernel=(bart,2,0) vardef=n; 24 run; NOTE: At 2SLS Iteration 1 convergence assumed because OBJECTIVE= 4.856915E-31 is almost zero (<1E-12). NOTE: The row y1,x is a linear combination of other rows in V NOTE: The row y3,1 is a linear combination of other rows in V NOTE: At GMM Iteration 0 convergence assumed because OBJECTIVE= 1.720897E-30 is almost zero (<1E-12). WARNING: The covariance across equations (the S matrix) is singular. A generalized inverse was computed by setting to zero the part of the S matrix for the following 1 equations whose residuals are linearly dependent with residuals from earlier equations: y3
Here is the outcome.
Nonlinear GMM Parameter Estimates Approx Approx Parameter Estimate Std Err t Value Pr > |t| a 0.036222 0.1328 <------ Biased b 1.055753 0 <------ Biased c -0.05973 0.1328 -0.45 0.6548 d 1.147963 0.1224 9.38 <.0001 e -0.09595 9.6907 <------ Biased
Here is another outcome without the y3 regression. There is no problem as y1 and y2 are not correlated.
Nonlinear GMM Parameter Estimates Approx Approx Parameter Estimate Std Err t Value Pr > |t| a 0.036222 0.1325 0.27 0.7857 b 1.055753 0.1329 7.94 <.0001 c -0.05973 0.1328 -0.45 0.6548 d 1.147963 0.1224 9.38 <.0001
Should I always write another PROC MODEL to separate y3=y2-y1 only? Is it impossible to estimate the submitted regressions through equation-by-equation GMM? Thanks for help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I understand those are the equations to be estimated; however, those equations can only be estimated simultaneously if the data used to estimate each equation are linearly independent of the other two equations.
If you would like to estimate each equation independently (or any two out of the three independently of the third) using the HAVE data set that you provided you would need to specify two (or three) FIT statements in PROC MODEL:
proc model;
y1=a+b*x;
y2=c+d*x;
y3=e+f*x;
fit y1 / gmm kernel=(bart,2,0) vardef=n;;
fit y2 / gmm kernel=(bart,2,0) vardef=n;
fit y3 / gmm kernel=(bart,2,0) vardef=n;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think the issue here is how you are expressing the model in PROC MODEL. It is the data you are using to estimate the model. The three equations you want to estimate are linearly independent because they each contain a different error term, 'u,' 'v' and 'w;' however, the HAVE data set computes the y3 error term as linear combination of the y1 and y2 error terms.
One way you can avoid that PROC MODEL warning is by adding an independent error term to y3 in the DATA step so that it better represents the model for y1, y2, and (y2-y1):
y3=y2-y1+rannor(1)
Marc
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your explanation, but the very first three regressions are not the data generating process but the regressions to be estimated—for example, in PROC REG,
proc reg;
model y1=x;
model y2=x;
model y3=x;
run;
but I rather used PROC MODEL to estimate Newey–West standard errors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I understand those are the equations to be estimated; however, those equations can only be estimated simultaneously if the data used to estimate each equation are linearly independent of the other two equations.
If you would like to estimate each equation independently (or any two out of the three independently of the third) using the HAVE data set that you provided you would need to specify two (or three) FIT statements in PROC MODEL:
proc model;
y1=a+b*x;
y2=c+d*x;
y3=e+f*x;
fit y1 / gmm kernel=(bart,2,0) vardef=n;;
fit y2 / gmm kernel=(bart,2,0) vardef=n;
fit y3 / gmm kernel=(bart,2,0) vardef=n;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks—separating FIT for each enables equation-by-equation GMM. Just one problem is that ODS OUTPUT cannot put the results altogether. For example,
ods output parameterestimates=want;
puts the results from the FIT right above it—N regressions mean N resulting data sets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want all your results in a single data set you'll have to concatenate them yourself. You could do something like this:
proc model data=have;
y1=a+b*x;
y2=c+d*x;
y3=e+f*x;
fit y1 / gmm kernel=(bart,2,0) vardef=n;
ods output parameterestimates=est1;
fit y2 / gmm kernel=(bart,2,0) vardef=n;
ods output parameterestimates=est2;
fit y3 / gmm kernel=(bart,2,0) vardef=n;
ods output parameterestimates=est3;
quit;
data ests;
set est1 est2 est3;
run;