Re: How proc QLIM estimate Probit with endogenous variables

zongxi · Posted 02-16-2021 04:03 PM

Hi,

I am new to proc QLIM and recently use it to estimate the structural equations.

In the model, I have three endogenous variables and two of them are discrete.

QLIM reported the following model fit summary,

"Optimization method": "Quasi-Newton"

"Seed for Monte Carlo Integration": 1514161564

"Number of Draws": 20

I was wondering how QLIM estimate the model and what's the assumption behind.

I assume it's using MLE estimation (multivariate normal distribution in my case) but wondering what's the assumption behind regarding the correlations between the three random variables.

Can anyone help provide the MLE equation and estimation steps for this model?

thanks in advance

========================================================

proc qlim data=review;

model y1 = y2 y3 z1/ discrete;

model y2 = z2 / discrete;

model y3 = z3;

run;

SteveDenham · Posted 02-17-2021 07:53 AM

It is always dangerous to use a technique without at least some background on the details of what the procedure is doing. This is a good time to read the Details part of the documentation for PROC QLIM, located here:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=etsug&docsetTarget=etsug_...

This is long, and many parts have nothing to do with your questions - but all of them are answered at some point in here.

SteveDenham

zongxi · Posted 02-18-2021 07:00 PM

Thank you Dr Denham. Thanks for your suggestions. My question is specific on how to derive the MLE in the case of Probit regression with binary and continuous endogenous variables. I read the SAS link you provide and Econometric Analysis of Cross Section and Panel Data (Wooldridge 2011). I derive the MLE myself,It's long and tedious and I am not sure it's correct or not.

suppose I want to estimate the list of equations, 1[.] is the indicator function, I ignore intercept and coefficients for simplicity.

y1=1[x+y2+y3+y4+e1>0] (1)
y2=1[z2+e2>0] (2)
y3=z3+e3 (3)
y4=z4+e4 (4)

a couple assumptions, e1 e2 are standard normal, a variance/covariance structure of e1-e4 is assumed to imply the assumption of endogeneity.

The goal here is to show the joint MLE function condition on exogenous and instrumental variables. Specifically,
f(y1, y2, y3, y4|x, z) = f(y1|y2,y3,y4,x,z)*f(y2,y3,y4|x,z).

The second term on the right hand side, f(y2,y3,y4|x,z) = f(y2|y3,y4,x,z)*f(y3,y4|x,z)

It's straightforward to derive with the properties of joint and conditional distribution of normal variables. (Wooldridge 2011)

The first term f(y1|y2,y3,y4,x,z) is somehow tricky. it requires to derive 4 combinations of y1 and y2 separately,

1. f(y1=1|y2=1,y3,y4,x,z)

2. f(y1=1|y2=0,y3,y4,x,z)

3. f(y1=0|y2=1,y3,y4,x,z)

4. f(y1=0|y2=0,y3,y4,x,z)

Take #1 for example,

p(y1=1│y2=1,y3,y4,x,z)
= E[p(e1>-x-y2-y3-y4|e2,e3,e4,x,z)|y2=1,y3,y4,x,z]

p(e1>-x-y2-y3-y4|e2,e3,e4,x,z) is a function of random variable e2,e3, and e4

let g(e2,e3,e4) = p(e1>-x-y2-y3-y4|e2,e3,e4,x,z), then

p(y1=1│y2=1,y3,y4,x,z) = Integal[g(e2,e3,e4)*f(e2,e3,e4|y2=1,y3,y4,x,z)] d(e2)*d(e3)*(de4)

The Integal[.] is operating on the high dimensional space of e2, e3, and e4, however on e3 and e4 the Integal a single point of value and on e2 is a range to allow y2=1.

Do you think this is on the right direction?

Thanks for your help anyway.

zongxi · Posted 02-19-2021 01:29 AM

Further reading the proc QLIM documentation. I assume SAS is using simulated MLE for my case.

y1=1[x+y2+y3+y4+e1>0] (1)
y2=1[z2+e2>0] (2)
y3=z3+e3 (3)
y4=z4+e4 (4)

So, the simplified MLE derivation goes as:

f(y1,y2,y3,y4|x, z) = f(y1, y2|y3, y4, x, z) * f(y3, y4|x, z)

1. Derive the joint density of (y3,y4) in second term of RHS is trival.

2. Derive first term require separate discussion of 4 combinations.

e.g, f(y1=1,y2=1|y3,y4,x,z) = f(e1 > -x-y2-y3-y4, e2 > -z2|y3,y4,x,z)

the joint density of e1, e2 condition on e3 and e4 can be derived as bivariate normal by the conditional distribution property of multivariate normal distribution.

(e1, e2|e3=y3-z3, e4=y4-z4) ~ MVN(mu, sigma)

So the MLE formula has more integrals compare to using WooldRidge's method (2011).

I assume SAS is using the simplified version and let computer do solve the multiple integrals in the MLE function.

I am seeing the output from QLIM,

Seed for Monte Carlo IntegrationNumber of Draws

1923567609

20

Assuming the "Monte Carlo" method is applied for simulated MLE, but I am not sure my above understanding is correct.

I hope someone can answer it.

Thank you in advance.

How proc QLIM estimate Probit with endogenous variables