BookmarkSubscribeRSS Feed
maxime
Calcite | Level 5

Hi everyone,

I have been trying for days to make a 2-stage regression on SAS 9.3, but I don’t think it’s the right tool for what I want to do. I want to estimate 3 models, which are not independent from each other, in the same regression. My professor told me to do a 2SLS, so I tried to do it, but in the second stage I only have one dependent variable estimated. The syslin procedure does not sound adapted to my case, from what I read online and experienced.

I’m using time-series data. Here are the 3 models, with a simple reg procedure.

proc reg data=Data.data;

model log(DepVar1) = log(Exogenous1) log(DepVar2) Dummy1 Dummy2 Dummy3 Dummy4 InstrumentVar;

run;

proc reg data=Data.data;

   model log(DepVar2) = log(Exogenous1) log(DepVar1/DepVar3) Dummy1 InstrumentVar1 InstrumentVar2;

run;

proc reg data=Data.data;

   model log(DepVar3) = lag(log(DepVar3)) log(Exogenous 1);

run;

And here is the syslin procedure, that gives interesting results but does not estimate DepVar2 and DepVar3 in the second stage.

proc syslin data=Data.data 2sls first;

endogenous logcargoYield;

instruments log(Exogenous1) log(Exogenous2)  Dummy1 Dummy2 Dummy3 Dummy4 InstrumentVar log(DepVar1/DepVar3) log(DepVar3)  lag(log(DepVar3));

model log(DepVar1) = log(Exogenous1) log(DepVar2) Dummy1 Dummy2 Dummy3 Dummy4 InstrumentVar;

model log(DepVar2) = log(Exogenous1) log(DepVar1/DepVar3) Dummy1 InstrumentVar1 InstrumentVar2;

model log(DepVar3) = lag(log(DepVar3)) log(Exogenous 1) ;

run;

Does someone have an idea of what procedure I could use ?

Thanks for any advice!

8 REPLIES 8
gunce_sas
SAS Employee

Hello,

You can solve your problem in couple of ways. One way is that you can use PROC SYSLIN with 2SLS option but write down your problem correctly. I believe the following commands address your problem better.

proc syslin data=Data.data 2sls first;

endogenous log(DepVar1) log(DepVar2) log(DepVar3);

instruments log(Exogenous1) log(Exogenous2) Dummy1 Dummy2 Dummy3 Dummy4 InstrumentVar1 InstrumentVar2

            lag(logDepVar3));

model log(DepVar1) = log(Exogenous1) log(DepVar2) Dummy1 Dummy2 Dummy3 Dummy4 InstrumentVar1;

model log(DepVar2) = log(Exogenous1) log(DepVar1) log(DepVar3) Dummy1 InstrumentVar1 InstrumentVar2;

model log(DepVar3) = lag(log(DepVar3)) log(Exogenous 1) ;

restrict log(DepVar1) = -log(DepVar3);

run;

Note that I used the fact that log(DepVar1/DepVar3) = log(DepVar1) - log(DepVar3).

 

This first method will not take the inter-equation relation into account. To be able to take this correlation into account you should use the same code above but chance 2sls option to 3sls.

As another method, you can use PROC QLIM:

proc qlim data=Data.data;

model log(DepVar1) = log(Exogenous1) log(DepVar2) Dummy1 Dummy2 Dummy3 Dummy4 InstrumentVar;

model log(DepVar2) = log(Exogenous1) log(DepVar1) log(DepVar3) Dummy1 InstrumentVar1 InstrumentVar2;

model log(DepVar3) = lag(log(DepVar3)) log(Exogenous 1);

restrict log(DepVar1) = -log(DepVar3);

run;

PROC QLIM will estimate these models jointly and in one step and does this more efficiently using MLE method. But here is the catch: Since there is simultaneity in the equations, the coefficients of the endogenous variables will most likely be inconsistent. If you had only SUR equations problem, then PROC QLIM would have been the best you could do (under the assumption of errors being distributed multivariate normal).

I hope this helps.

maxime
Calcite | Level 5

Thank you so much!

I am trying what you advised me, I'll let you know how it works.

maxime
Calcite | Level 5

Thank you Gunce, the 3SLS works pretty well!

maxime
Calcite | Level 5

Hi,

Actually I am not sure I am using 3SLS correctly: indeed, the third stage, the estimation step, only gives an estimation of the coefficients for one of the dependent variable. I thought I could use the coefficients of the second step, where all the dependent variables are evaluated, but the coefficients for the dependent variable evaluated at step 3 are different at these 2 steps. From the definition of 3SLS, the second step's purpose is to evaluate the residuals, so there is no reason why I should use these coefficients, right? How can I get the right coefficients for the 2 dependent variables that are not evaluated at step 3?

I tried QLIM but it didn't work well for me.

Thank you very much!

ets_kps
SAS Employee

Hi Maxime,

I'll try to help you out.  The first issue is that I would stick to either SYSLIN or MODEL to solve your problem. Here is a nice example.

Is there a way you can post the code?  Perhaps the lack of coefficients printing is a printing option. Make sure you have FIRST on the PROC STATEMENT.  Also, please refer to the doc on the subject.

In reading your post, it would almost seem to be more of a conceptual issue.  Think of the difference between 2sls and 3sls as similar to the difference between OLS and FGLS. There is good reason that 3SLS will give you slightly different estimates than 2SLS as they are different estimators (but they are likely very similar).  There would be no reason to use the second stage coefficients if you are interested in 3SLS.

I'd be happy to provide more help if you can replicate your problem on this thread.

-Ken

maxime
Calcite | Level 5

Hi Ken,

Thank you very much for your help. I looked at the Klein example and ran it, and it makes me think that 3SLS is simply not able to do what I want: obtain estimates for the 3 models, instead of just one (CONSUME in case of the Klein model, log(DepVar1) in my case). In the 3SLS, I showed the first step using FIRST, but I am not sure it is useful as this step only gets the predicted values for the endogenous regressors.

I am sorry if I am not very clear, I am doing my best! To sum up, I am looking for a procedure able to estimate the 3 models at the same time and print the estimation results for all of them. 3SLS only shows the estimation results for one of them.

Here is my code, that I modified using Gunce's advice, and with a few changes linked to the advancement of my research:

proc syslin data=Data.data 3sls first;

endogenous log(DepVar1) log(DepVar2) log(DepVar3);

instruments log(Exogenous1) Dummy1 Dummy2 Dummy3 log(Exogenous2) Dummy4 log(Exogenous3) InstrumentVar InstrumentVar2 Trend lag(log(DepVar3)) lag(log(DepVar1)/log(DepVar3));

model log(DepVar1) = log(Exogenous1) log(DepVar2) Dummy4 Dummy1 Dummy2 Dummy3 InstrumentVar2 Trend;

model log(DepVar2) = log(Exogenous2) log(DepVar1) log(DepVar3) Dummy4 log(Exogenous3) InstrumentVar InstrumentVar2 Trend;

restrict log(DepVar1) = - log(DepVar3);

model log(DepVar3) = lag(log(DepVar3)) log(Exogenous1) lag(log(DepVar1)/log(DepVar3));

run;


Thank you again for taking the time to help me!


Maxime

ets_kps
SAS Employee

I am sorry, but I am struggling to replicate your problem.  If I run the example I sent you, I get estimates of the three models, 1)consumption, 2) investment and 3) labor.

I suspect if you are not seeing the same results, that you have a ODS or printing issue. Make sure you turn ODS GRAPHICS ON;  and that you don't have any odd printing options included. Good luck

3sls.PNG

maxime
Calcite | Level 5

Thank you Ken!

I have tried turning ODS GRAPHICS ON as well as resetting all ODS options, and it did not change anything (I have tried both with my models and the example, to be sure the problem does not come from my modelling). I still only get the first model. It is really weird, I could not find anything online on this printing problem, including on this page.

However, it seems that I can get the results for the three models by switching their position (for instance, if I want to get the final 3SLS results for log(DepVar2), I can use

proc syslin data=Data.data 3sls first;

endogenous log(DepVar1) log(DepVar2) log(DepVar3);

instruments log(Exogenous1) Dummy1 Dummy2 Dummy3 log(Exogenous2) Dummy4 log(Exogenous3) InstrumentVar InstrumentVar2 Trend lag(log(DepVar3)) lag(log(DepVar1)/log(DepVar3));

model log(DepVar2) = log(Exogenous2) log(DepVar1) log(DepVar3) Dummy4 log(Exogenous3) InstrumentVar InstrumentVar2 Trend;

restrict log(DepVar1) = - log(DepVar3);

model log(DepVar1) = log(Exogenous1) log(DepVar2) Dummy4 Dummy1 Dummy2 Dummy3 InstrumentVar2 Trend;

model log(DepVar3) = lag(log(DepVar3)) log(Exogenous1) lag(log(DepVar1)/log(DepVar3));

run; )


It is not a "clean" solution, but at least it gives me the good estimates.

I am still very interested in finding out where this problem comes from, I am going to keep looking. I don't think it is useful I send you my dataset, as I get the same problem with the example.

Thank you very much for your help Ken!

Maxime

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2043 views
  • 6 likes
  • 3 in conversation