BookmarkSubscribeRSS Feed
zongxi
Fluorite | Level 6

Hi,

I am new to proc QLIM and recently use it to estimate the structural equations. 

In the model, I have three endogenous variables and two of them are discrete.  

 

QLIM reported the following model fit summary, 

"Optimization method": "Quasi-Newton" 

"Seed for Monte Carlo Integration": 1514161564

"Number of Draws": 20

 

I was wondering how QLIM estimate the model and what's the assumption behind. 

I assume it's using MLE estimation (multivariate normal distribution in my case) but wondering what's the assumption behind regarding the correlations between the three random variables. 

 

Can anyone help provide the MLE equation and estimation steps for this model? 

 

thanks in advance

 

========================================================

proc qlim data=review;

      model y1 = y2 y3 z1/ discrete;

      model y2 = z2 / discrete;

      model y3 = z3;

   run;

3 REPLIES 3
SteveDenham
Jade | Level 19

It is always dangerous to use a technique without at least some background on the details of what the procedure is doing.  This is a good time to read the Details part of the documentation for PROC QLIM, located here:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=etsug&docsetTarget=etsug_... 

This is long, and many parts have nothing to do with your questions - but all of them are answered at some point in here.

 

SteveDenham

zongxi
Fluorite | Level 6

Thank you Dr Denham. Thanks for your suggestions. My question is specific on how to derive the MLE in the case of Probit regression with binary and continuous endogenous variables. I read the SAS link you provide and Econometric Analysis of Cross Section and Panel Data (Wooldridge 2011). I derive the MLE myself,It's long and tedious and I am not sure it's correct or not. 

 

suppose I want to estimate the list of equations, 1[.] is the indicator function, I ignore intercept and coefficients for simplicity. 

  y1=1[x+y2+y3+y4+e1>0] (1)
  y2=1[z2+e2>0] (2)
  y3=z3+e3 (3)
  y4=z4+e4 (4)

a couple assumptions, e1 e2 are standard normal, a variance/covariance structure of e1-e4 is assumed to imply the assumption of endogeneity.

 

The goal here is to show the joint MLE function condition on exogenous and instrumental variables. Specifically, 
f(y1, y2, y3, y4|x, z) = f(y1|y2,y3,y4,x,z)*f(y2,y3,y4|x,z).

 

The second term on the right hand side, f(y2,y3,y4|x,z) = f(y2|y3,y4,x,z)*f(y3,y4|x,z)

It's straightforward to derive with the properties of joint and conditional distribution of normal variables. (Wooldridge 2011)

 

The first term f(y1|y2,y3,y4,x,z) is somehow tricky. it requires to derive 4 combinations of y1 and y2 separately, 

   1. f(y1=1|y2=1,y3,y4,x,z) 

   2. f(y1=1|y2=0,y3,y4,x,z)

   3. f(y1=0|y2=1,y3,y4,x,z) 

   4. f(y1=0|y2=0,y3,y4,x,z)

 

Take #1 for example, 

p(y1=1│y2=1,y3,y4,x,z)
= E[p(e1>-x-y2-y3-y4|e2,e3,e4,x,z)|y2=1,y3,y4,x,z]

p(e1>-x-y2-y3-y4|e2,e3,e4,x,z) is a function of random variable e2,e3, and e4

let g(e2,e3,e4) = p(e1>-x-y2-y3-y4|e2,e3,e4,x,z), then 

p(y1=1│y2=1,y3,y4,x,z) = Integal[g(e2,e3,e4)*f(e2,e3,e4|y2=1,y3,y4,x,z)] d(e2)*d(e3)*(de4)

 

The Integal[.] is operating on the high dimensional space of e2, e3, and e4, however on e3 and e4 the Integal  a single point of value and on e2 is a range to allow y2=1. 

 

Do you think this is on the right direction?

 

Thanks for your help anyway. 

 

   

 

zongxi
Fluorite | Level 6

Further reading the proc QLIM documentation. I assume SAS is using simulated MLE for my case. 

 

y1=1[x+y2+y3+y4+e1>0] (1)
  y2=1[z2+e2>0] (2)
  y3=z3+e3 (3)
  y4=z4+e4 (4)

 

So, the simplified MLE derivation goes as:

f(y1,y2,y3,y4|x, z) = f(y1, y2|y3, y4, x, z) * f(y3, y4|x, z)

1. Derive the joint density of (y3,y4) in second term of RHS is trival. 

2. Derive first term require separate discussion of 4 combinations. 

    e.g, f(y1=1,y2=1|y3,y4,x,z) = f(e1 > -x-y2-y3-y4, e2 > -z2|y3,y4,x,z)

   the joint density of e1, e2 condition on e3 and e4 can be derived as bivariate normal by the conditional distribution property of multivariate normal distribution. 

   (e1, e2|e3=y3-z3, e4=y4-z4) ~ MVN(mu, sigma)

 

So the MLE formula has more integrals compare to using WooldRidge's method (2011).

I assume SAS is using the simplified version and let computer do solve the multiple integrals in the MLE function. 

 

I am seeing the output from QLIM, 

Seed for Monte Carlo IntegrationNumber of Draws

1923567609
20

 

Assuming the "Monte Carlo" method is applied for simulated MLE, but I am not sure my above understanding is correct. 

 

I hope someone can answer it. 

Thank you in advance.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 686 views
  • 2 likes
  • 2 in conversation