BookmarkSubscribeRSS Feed
anujmehta
Calcite | Level 5

I am attempting to perform an IV analysis to help account for unmeasured confounding in an observational study. Here is the summary of my dataset:

Outcome: wage (continuous value - log transformed)

Primary exposure/dependent variable: education - binary 1/0

IV: near: binary 1/0 (meets all criteria for a reasonable IV

var1-var8 - covariates/measured confounders all of which are 1/0 indicator variables, not associated with the IV.  

The goal is to estimate the association of education and wage accounting for measured confounders and also unmeasured confounding via an IV analysis. 

I want to use a 2 SLS approach. I can do a 2 step proc reg approach with the first model regressing education on near. I can output the predicted values and then use those values in the 2nd step in which i regress wage on pred_education

I am running in some peculiar results when I try this in proc syslin. I want to do it in proc syslin to account for possible correlated error terms across the 2 models and to carry the SE of the estimates for pred_education forward into step 2. 

In using proc syslin I use the following syntax with double adjustments for the covariates var1-var8:

Approach 1

proc syslin data 2sls;

endogenous education;

instruments near; 

stepone: model education = near var1-var8;

steptwo: model wage = education var1-var8; run;

 

The model works fine and I get estimates for each model. in the first model, the parameter estimate for near=0.067, SE=0.219, p=0.0021In the second model, the parameter estimate for education = 0.62, SE=0.2324, p=0.0077. I also get estimates for var1-var8 but i am leaving those out from this post. This is different than if i do a 2 step proc reg although that is not surprising. 

What is surprising is what happened when I tried to adjust for measured confounders in only 1 step with the following code:

 

Approach 2

proc syslin data 2sls;

endogenous education;

instruments near; 

stepone: model education = near;

steptwo: model wage = education var1-var8; run;

 

in the stepone model, var1-var8 have been eliminated and we only adjust for them in steptwo. When i run this code i get the following estimates:

stepone: parameter estimate for near = 0.1030, SE=0.0226, p<0.0001 [no estimates for var1-var8 as they were not included]

steptwo: parameter estimate for education = 0.62, SE=0.2324, p=0.0077, the same as the first approach with covariate adjustment in both steps. Moreover, the parameter estimates for var1-var8 are the same between approach 1 and 2. 

 

My issue is that if I don't include var1-var8 in stepone I would expect the predictions for education to change which should affect the 2nd model in steptwo. My question to the community is why are the results of steptwo in Approach 1 and Approach 2 the same even if stepone is different between the 2.

 

Any insight would be greatly appreciated. 

 

 

1 REPLY 1
sbxkoenk
SAS Super FREQ

This is an econometrics question, so I have moved the topic to

"SAS Forecasting and Econometrics" board.

Koen

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

Discussion stats
  • 1 reply
  • 847 views
  • 0 likes
  • 2 in conversation