Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How proc QLIM estimate Probit with endogenous variables

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-16-2021 04:03 PM
(638 views)

Hi,

I am new to proc QLIM and recently use it to estimate the structural equations.

In the model, I have three endogenous variables and two of them are discrete.

QLIM reported the following model fit summary,

"Optimization method": "Quasi-Newton"

"Seed for Monte Carlo Integration": 1514161564

"Number of Draws": 20

I was wondering how QLIM estimate the model and what's the assumption behind.

I assume it's using MLE estimation (multivariate normal distribution in my case) but wondering what's the assumption behind regarding the correlations between the three random variables.

Can anyone help provide the MLE equation and estimation steps for this model?

thanks in advance

========================================================

proc qlim data=review;

model y1 = y2 y3 z1/ discrete;

model y2 = z2 / discrete;

model y3 = z3;

run;

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It is always dangerous to use a technique without at least some background on the details of what the procedure is doing. This is a good time to read the Details part of the documentation for PROC QLIM, located here:

This is long, and many parts have nothing to do with your questions - but all of them are answered at some point in here.

SteveDenham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Dr Denham. Thanks for your suggestions. My question is specific on how to derive the MLE in the case of Probit regression with binary and continuous endogenous variables. I read the SAS link you provide and Econometric Analysis of Cross Section and Panel Data (Wooldridge 2011). I derive the MLE myself,It's long and tedious and I am not sure it's correct or not.

suppose I want to estimate the list of equations, 1[.] is the indicator function, I ignore intercept and coefficients for simplicity.

y1=1[x+y2+y3+y4+e1>0] (1)

y2=1[z2+e2>0] (2)

y3=z3+e3 (3)

y4=z4+e4 (4)

a couple assumptions, e1 e2 are standard normal, a variance/covariance structure of e1-e4 is assumed to imply the assumption of endogeneity.

The goal here is to show the joint MLE function condition on exogenous and instrumental variables. Specifically,

f(y1, y2, y3, y4|x, z) = f(y1|y2,y3,y4,x,z)*f(y2,y3,y4|x,z).

The second term on the right hand side, f(y2,y3,y4|x,z) = f(y2|y3,y4,x,z)*f(y3,y4|x,z)

It's straightforward to derive with the properties of joint and conditional distribution of normal variables. (Wooldridge 2011)

The first term f(y1|y2,y3,y4,x,z) is somehow tricky. it requires to derive 4 combinations of y1 and y2 separately,

1. f(y1=1|y2=1,y3,y4,x,z)

2. f(y1=1|y2=0,y3,y4,x,z)

3. f(y1=0|y2=1,y3,y4,x,z)

4. f(y1=0|y2=0,y3,y4,x,z)

Take #1 for example,

p(y1=1│y2=1,y3,y4,x,z)

= E[p(e1>-x-y2-y3-y4|e2,e3,e4,x,z)|y2=1,y3,y4,x,z]

p(e1>-x-y2-y3-y4|e2,e3,e4,x,z) is a function of random variable e2,e3, and e4

let g(e2,e3,e4) = p(e1>-x-y2-y3-y4|e2,e3,e4,x,z), then

p(y1=1│y2=1,y3,y4,x,z) = Integal[g(e2,e3,e4)*f(e2,e3,e4|y2=1,y3,y4,x,z)] d(e2)*d(e3)*(de4)

The Integal[.] is operating on the high dimensional space of e2, e3, and e4, however on e3 and e4 the Integal a single point of value and on e2 is a range to allow y2=1.

Do you think this is on the right direction?

Thanks for your help anyway.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Further reading the proc QLIM documentation. I assume SAS is using simulated MLE for my case.

y1=1[x+y2+y3+y4+e1>0] (1)

y2=1[z2+e2>0] (2)

y3=z3+e3 (3)

y4=z4+e4 (4)

So, the simplified MLE derivation goes as:

f(y1,y2,y3,y4|x, z) = f(y1, y2|y3, y4, x, z) * f(y3, y4|x, z)

1. Derive the joint density of (y3,y4) in second term of RHS is trival.

2. Derive first term require separate discussion of 4 combinations.

e.g, f(y1=1,y2=1|y3,y4,x,z) = f(e1 > -x-y2-y3-y4, e2 > -z2|y3,y4,x,z)

the joint density of e1, e2 condition on e3 and e4 can be derived as bivariate normal by the conditional distribution property of multivariate normal distribution.

(e1, e2|e3=y3-z3, e4=y4-z4) ~ MVN(mu, sigma)

So the MLE formula has more integrals compare to using WooldRidge's method (2011).

I assume SAS is using the simplified version and let computer do solve the multiple integrals in the MLE function.

I am seeing the output from QLIM,

Seed for Monte Carlo IntegrationNumber of Draws

1923567609 |

20 |

Assuming the "Monte Carlo" method is applied for simulated MLE, but I am not sure my above understanding is correct.

I hope someone can answer it.

Thank you in advance.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.