Calcite | Level 5

## Heckman's correction and GLM

Hi,

I am trying to estimate a GLM within a 2 step Heckman's correction method. I have looked at the reference materials. The SAS documentation shows (http://support.sas.com/documentation/cdl/en/etsug/67525/HTML/default/viewer.htm#etsug_qlim_examples0... that the selection model and the response models are estimated together. It also shows some other model types you can specifiy in this framework, but  I dont believe GLM can be modeled same way, with the seletion probability model.

So, my question - is applying two step would be a right approach ? That is estimating the selection probability mdoel, and then calculating inverse mills ration and then using it to the GLM model specification.

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Employee

## Re: Heckman's correction and GLM

Hi,

First of all, I would like to state what I understand from your problem:

Your selection model consists of two models. You have a probit selection equation that defines your selection “rule” and a model that you are actually interested in estimating (the response model). In your case, the response model is a GLM, i.e., the response variable distribution is a member of the exponential family, which includes the normal, Poisson, binomial, exponential, and gamma distributions.

If your response model is linear, which is a special case of the GLM, then all you need to do is to use the HECKIT option of the PROC QLIM. The HECKIT option requests that the selection model be estimated by Heckman’s two-step estimation method as it is defined in his 1979 paper (for details http://support.sas.com/documentation/cdl/en/etsug/67525/HTML/default/viewer.htm#etsug_qlim_details17...) . Using the example that you pointed out this can be done with the SAS program as

/*-- Sample Selection --*/

proc qlim data=mroz heckit;

model inlf = nwifeinc educ exper expersq

age kidslt6 kidsge6 /discrete;

model lwage = educ exper expersq / select(inlf=1);

run;

If your response model is nonlinear, for example if you have a binary response model or exponential response model, then, most likely, applying this particular selection bias correction method by estimating the selection equation by probit and then plugging the estimated inverse Mills ratio into the second-stage estimation method using only the selected sample will NOT be valid. In this case, you need to figure out the nature of the bias based on the particular assumptions of your model and apply the two-step method manually.

However, testing the null hypothesis of no selection bias when you have a binary response model can be done easily. For this, use SECONDSTAGE=ML suboption of the HECKIT option and use the t value on the coefficient on the _y.LAMBDA parameter where y is the dependent variable in your response model. Below is an example

proc qlim data=mroz heckit(secondstage=ML);

model inlf = nwifeinc educ exper expersq

age kidslt6 kidsge6 /discrete;

model lwage = educ exper expersq / discrete select(inlf=1);

run;

I hope this helps,

Best regards,

Gunce

SAS Employee

## Re: Heckman's correction and GLM

Hi,

First of all, I would like to state what I understand from your problem:

Your selection model consists of two models. You have a probit selection equation that defines your selection “rule” and a model that you are actually interested in estimating (the response model). In your case, the response model is a GLM, i.e., the response variable distribution is a member of the exponential family, which includes the normal, Poisson, binomial, exponential, and gamma distributions.

If your response model is linear, which is a special case of the GLM, then all you need to do is to use the HECKIT option of the PROC QLIM. The HECKIT option requests that the selection model be estimated by Heckman’s two-step estimation method as it is defined in his 1979 paper (for details http://support.sas.com/documentation/cdl/en/etsug/67525/HTML/default/viewer.htm#etsug_qlim_details17...) . Using the example that you pointed out this can be done with the SAS program as

/*-- Sample Selection --*/

proc qlim data=mroz heckit;

model inlf = nwifeinc educ exper expersq

age kidslt6 kidsge6 /discrete;

model lwage = educ exper expersq / select(inlf=1);

run;

If your response model is nonlinear, for example if you have a binary response model or exponential response model, then, most likely, applying this particular selection bias correction method by estimating the selection equation by probit and then plugging the estimated inverse Mills ratio into the second-stage estimation method using only the selected sample will NOT be valid. In this case, you need to figure out the nature of the bias based on the particular assumptions of your model and apply the two-step method manually.

However, testing the null hypothesis of no selection bias when you have a binary response model can be done easily. For this, use SECONDSTAGE=ML suboption of the HECKIT option and use the t value on the coefficient on the _y.LAMBDA parameter where y is the dependent variable in your response model. Below is an example

proc qlim data=mroz heckit(secondstage=ML);

model inlf = nwifeinc educ exper expersq

age kidslt6 kidsge6 /discrete;

model lwage = educ exper expersq / discrete select(inlf=1);

run;

I hope this helps,

Best regards,

Gunce

Discussion stats