I am attempting to perform an IV analysis to help account for unmeasured confounding in an observational study. Here is the summary of my dataset:
Outcome: wage (continuous value - log transformed)
Primary exposure/dependent variable: education - binary 1/0
IV: near: binary 1/0 (meets all criteria for a reasonable IV
var1-var8 - covariates/measured confounders all of which are 1/0 indicator variables, not associated with the IV.
The goal is to estimate the association of education and wage accounting for measured confounders and also unmeasured confounding via an IV analysis.
I want to use a 2 SLS approach. I can do a 2 step proc reg approach with the first model regressing education on near. I can output the predicted values and then use those values in the 2nd step in which i regress wage on pred_education.
I am running in some peculiar results when I try this in proc syslin. I want to do it in proc syslin to account for possible correlated error terms across the 2 models and to carry the SE of the estimates for pred_education forward into step 2.
In using proc syslin I use the following syntax with double adjustments for the covariates var1-var8:
Approach 1
proc syslin data 2sls;
endogenous education;
instruments near;
stepone: model education = near var1-var8;
steptwo: model wage = education var1-var8; run;
The model works fine and I get estimates for each model. in the first model, the parameter estimate for near=0.067, SE=0.219, p=0.0021. In the second model, the parameter estimate for education = 0.62, SE=0.2324, p=0.0077. I also get estimates for var1-var8 but i am leaving those out from this post. This is different than if i do a 2 step proc reg although that is not surprising.
What is surprising is what happened when I tried to adjust for measured confounders in only 1 step with the following code:
Approach 2
proc syslin data 2sls;
endogenous education;
instruments near;
stepone: model education = near;
steptwo: model wage = education var1-var8; run;
in the stepone model, var1-var8 have been eliminated and we only adjust for them in steptwo. When i run this code i get the following estimates:
stepone: parameter estimate for near = 0.1030, SE=0.0226, p<0.0001 [no estimates for var1-var8 as they were not included]
steptwo: parameter estimate for education = 0.62, SE=0.2324, p=0.0077, the same as the first approach with covariate adjustment in both steps. Moreover, the parameter estimates for var1-var8 are the same between approach 1 and 2.
My issue is that if I don't include var1-var8 in stepone I would expect the predictions for education to change which should affect the 2nd model in steptwo. My question to the community is why are the results of steptwo in Approach 1 and Approach 2 the same even if stepone is different between the 2.
Any insight would be greatly appreciated.
This is an econometrics question, so I have moved the topic to
"SAS Forecasting and Econometrics" board.
Koen
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.