I am attempting to perform an IV analysis to help account for unmeasured confounding in an observational study. Here is the summary of my dataset:
Outcome: wage (continuous value - log transformed)
Primary exposure/dependent variable: education - binary 1/0
IV: near: binary 1/0 (meets all criteria for a reasonable IV
var1-var8 - covariates/measured confounders all of which are 1/0 indicator variables, not associated with the IV.
The goal is to estimate the association of education and wage accounting for measured confounders and also unmeasured confounding via an IV analysis.
I want to use a 2 SLS approach. I can do a 2 step proc reg approach with the first model regressing education on near. I can output the predicted values and then use those values in the 2nd step in which i regress wage on pred_education.
I am running in some peculiar results when I try this in proc syslin. I want to do it in proc syslin to account for possible correlated error terms across the 2 models and to carry the SE of the estimates for pred_education forward into step 2.
In using proc syslin I use the following syntax with double adjustments for the covariates var1-var8:
Approach 1
proc syslin data 2sls;
endogenous education;
instruments near;
stepone: model education = near var1-var8;
steptwo: model wage = education var1-var8; run;
The model works fine and I get estimates for each model. in the first model, the parameter estimate for near=0.067, SE=0.219, p=0.0021. In the second model, the parameter estimate for education = 0.62, SE=0.2324, p=0.0077. I also get estimates for var1-var8 but i am leaving those out from this post. This is different than if i do a 2 step proc reg although that is not surprising.
What is surprising is what happened when I tried to adjust for measured confounders in only 1 step with the following code:
Approach 2
proc syslin data 2sls;
endogenous education;
instruments near;
stepone: model education = near;
steptwo: model wage = education var1-var8; run;
in the stepone model, var1-var8 have been eliminated and we only adjust for them in steptwo. When i run this code i get the following estimates:
stepone: parameter estimate for near = 0.1030, SE=0.0226, p<0.0001 [no estimates for var1-var8 as they were not included]
steptwo: parameter estimate for education = 0.62, SE=0.2324, p=0.0077, the same as the first approach with covariate adjustment in both steps. Moreover, the parameter estimates for var1-var8 are the same between approach 1 and 2.
My issue is that if I don't include var1-var8 in stepone I would expect the predictions for education to change which should affect the 2nd model in steptwo. My question to the community is why are the results of steptwo in Approach 1 and Approach 2 the same even if stepone is different between the 2.
Any insight would be greatly appreciated.
This is an econometrics question, so I have moved the topic to
"SAS Forecasting and Econometrics" board.
Koen
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.