I am attempting to perform an IV analysis to help account for unmeasured confounding in an observational study. Here is the summary of my dataset: Outcome: wage (continuous value - log transformed) Primary exposure/dependent variable: education - binary 1/0 IV: near: binary 1/0 (meets all criteria for a reasonable IV var1-var8 - covariates/measured confounders all of which are 1/0 indicator variables, not associated with the IV. The goal is to estimate the association of education and wage accounting for measured confounders and also unmeasured confounding via an IV analysis. I want to use a 2 SLS approach. I can do a 2 step proc reg approach with the first model regressing education on near. I can output the predicted values and then use those values in the 2nd step in which i regress wage on pred_education. I am running in some peculiar results when I try this in proc syslin. I want to do it in proc syslin to account for possible correlated error terms across the 2 models and to carry the SE of the estimates for pred_education forward into step 2. In using proc syslin I use the following syntax with double adjustments for the covariates var1-var8: Approach 1 proc syslin data 2sls; endogenous education; instruments near; stepone: model education = near var1-var8; steptwo: model wage = education var1-var8; run; The model works fine and I get estimates for each model. in the first model, the parameter estimate for near=0.067, SE=0.219, p=0.0021. In the second model, the parameter estimate for education = 0.62, SE=0.2324, p=0.0077. I also get estimates for var1-var8 but i am leaving those out from this post. This is different than if i do a 2 step proc reg although that is not surprising. What is surprising is what happened when I tried to adjust for measured confounders in only 1 step with the following code: Approach 2 proc syslin data 2sls; endogenous education; instruments near; stepone: model education = near; steptwo: model wage = education var1-var8; run; in the stepone model, var1-var8 have been eliminated and we only adjust for them in steptwo. When i run this code i get the following estimates: stepone: parameter estimate for near = 0.1030, SE=0.0226, p<0.0001 [no estimates for var1-var8 as they were not included] steptwo: parameter estimate for education = 0.62, SE=0.2324, p=0.0077, the same as the first approach with covariate adjustment in both steps. Moreover, the parameter estimates for var1-var8 are the same between approach 1 and 2. My issue is that if I don't include var1-var8 in stepone I would expect the predictions for education to change which should affect the 2nd model in steptwo. My question to the community is why are the results of steptwo in Approach 1 and Approach 2 the same even if stepone is different between the 2. Any insight would be greatly appreciated.
... View more