Hello, I'm analyzing some data and need to test whether a 2SLS or OLS model is most appropriate.
I understand that I need to run the Hausman specification test and the syntax provided in the SAS documentation looks straightforward but I'm not sure how to execute the code.
The data contain a dependent variable Y, a mediator M, and two instruments X and Z, and their interaction.
the model I'm testing is Y = M + X + Z + X*Z.
The code provided in the SAS help guide under 'Hausman specification test' is provided below. My question is what needs to be inputted. What do y1 and y2 represent below? Does p represent predicted values? I assume interc refers to an interaction term? And what is d2?
proc model data=one out=fiml2; endogenous y1 y2;
y1 = py2 * y2 + px1 * x1 + interc;
y2 = py1* y1 + pz1 * z1 + d2;
fit y1 y2 / ols 2sls hausman;
instruments x1 z1; run;
Any help and even an example would be much appreciated.
Thanks!
Dave,
To do an equivalent 2SLS estimation to the one I provided earlier using PROC MODEL you could do the following in PROC SYSLIN. The additional DATA step is necesssary because I don't believe PROC SYSLIN supports the 'x*z' syntax used in the code you provided.
Marc
data one;
set d;
xz = x*z;
run;
proc syslin data=one 2sls first;
endogenous m;
instruments x z xz;
model y = m;
run;
Hi,
You are using the example from here:
You can generate the example data set (one) using this link:
SAS/ETS User's Guide Example Programs
y1, y2 are endogenous variables, x1, z1 are instrumental variables, py1, py2 px1, pz1 d2, inerc are all parameters.
Hi Gergely,
Thanks for pointing me to the sample data, which is helpful.
One other question - how does one modify the syntax for a third variable? For example, what if there were two instrumental variables PLUS their interaction in the model. How would the two lines of the syntax be changed? Does 'interc' capture that? Another way to ask this might be, what if there were 3 instruments, not 2?
Using the varnames from the syntax above, the model I'm trying to run is:
y1 = y2 + x1 +z1 +x1*z1.
is there a way to modify the syntax to accomplish that?
thanks,
Dave
When you specify a model in PROC MODEL, you should explicitly use parameters:
y1 = y2par * y2 + x1par * x1 +z1par * z1 + xzpar * x1*z1 + intercept
In previous examples interc was the intercept parameter, not the interaction.
I suggest you to read the documentation of PROC MODEL, especially about 2SLS and instrumental variables.
If you want to include the interaction as an instrumental variable, you can create it in an assignment statement, then use it in the instruments statement. (xz=x1*z1; instruments xz;)
Do you have only 1 equation?
Dave,
It looks like Gergely did a good job at showing you how to specify models in PROC MODEL and how to specify an instrument that represents the interaction between the variables X and Y.
To run the Hausman test for your model you could use something like the following example. In this example M is instrumented using X, Z, and X*Z. In the ouput you should get a Hausman specification test static value of 7.65 with a p-value of 0.0218. Therefore, using a 5% significance level you would have to reject the null hypothesis that the OLS estimater is consistent for this model. I hope this helps.
Marc
data d;
call streaminit (1);
do i = 1 to 2000;
r = rand('normal');
x = r + rand('normal');
z = r + rand('normal');
m = r + rand('normal');
y = 3*m + 1 + rand('normal');
output;
end;
run;
proc model data=d;
y = pm*m + interc;
xz = x*z;
instruments x z xz;
fit y / ols 2sls hausman;
quit;
Hi Marc,
Thanks for the help.
Sorry to be a bit slow - I'm not that familiar with 2SLS.
Here's what I ran in 2SLS and am now trying to run the Hausman test in PROC model. Even with the guidance on the chain, I still can't get it to work:
Proc SYSLIN data= one 2SLS FIRST;
Endogenous m ;
Instruments x z x*z;
Model y= m x*z/ OVERID DW PLOT;
Run;
thanks,
Dave
Dave,
To do an equivalent 2SLS estimation to the one I provided earlier using PROC MODEL you could do the following in PROC SYSLIN. The additional DATA step is necesssary because I don't believe PROC SYSLIN supports the 'x*z' syntax used in the code you provided.
Marc
data one;
set d;
xz = x*z;
run;
proc syslin data=one 2sls first;
endogenous m;
instruments x z xz;
model y = m;
run;
Thanks all - got the models to work.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.