Hurdle model with negative binomial

eberdan · Posted 06-13-2011 04:36 PM

Hi,
I am trying to do a hurdle model with a truncated negative binomial
distribution instead of a poisson. I keep getting the error message:

"NOTE: Execution error for observation 1."

I have tried setting the initial parameters to zero, 1 and -1. But I still
get the same message. The code I am running is:

options ls=74 ps=100 nodate nocenter pageno=1;
Data main;
input salinity male female court;
mf = male*female;
sf = salinity*female;
sm = salinity*male;
smf= salinity*male*female;
;
cards;
0 1 1 0
0 1 1 0
0 1 1 29
...more data......
;

proc sort data=main;
by male female salinity;
run;

proc nlmixed data = main;
parms a0 = 0 a1 = 0 a2 =0 a3=0 b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 =
0 b7 = 0 alpha=0;
eta0 = a0 + a1 * male + a2 *female + a3 *mf;
etap = b0 + b1 * male + b2 * female + b3 * mf + b4 * salinity +
b5 * sm + b6 * sf + b7 *smf;
exp_eta0 = exp(eta0);
mu = exp(etap);
p0 = exp_eta0 / (1 + exp_eta0);
m=1/alpha;
if court = 0 then ll = log(p0);
else ll = lgamma(court+m)-lgamma(court+1)-lgamma(m) + court*log(alpha*mu)-
(court+m)*log(1+alpha*mu)- log(1 -( 1 + alpha*mu)**(-m));
model court ~ general(ll);
run;

Any help would be greatly appreciated!!,
Emma

lvm · Posted 06-13-2011 10:52 PM

More than likely, your program is trying to take a log of 0 (including lgamma of 0), or creating infinity. For instance, with alpha=0 (starting value), m=1/alpha will cause big problems (infinity). Also, there are log or lgamma functions that involve alpha (which will also blow up with 0 for alpha). I took your data and added a few more dummy data points. The program ran when I used 1 for the starting value of alpha. But this is still dangerous, because parameter estimates can migrate to 'impossible' regions during the optimization. You should look into putting bounds on the parameters. Check out the syntax of the bounds statement.
(I did not really look at the rest of the code to see if the customized ll is correct).

Dale · Posted 06-14-2011 01:07 PM

The hurdle log-likelihood function does appear sound. As lvm has indicated, computing 1/alpha for alpha initialized to 0 is a problem.

Personally, I prefer not to use bounds statements to control acceptable values for the parameters. Derivatives are usually not defined when parameter estimates are fixed at boundary values. Sometimes, the function itself is not defined when a parameter takes on the boundary value. (Such is the case for the NB distribution when you specify alpha=0. Even though the NB is a Poisson when alpha=0, it is not possible to compute the Poisson density using a negative binomial density function.)

Instead of using boundary constraints, consider parameterizing the model so that the parameter estimates are always in an acceptable range. For instance, in the NB distribution, the parameter alpha must always be positive. If you name log_alpha as a parameter and then compute alpha=exp(log_alpha), then alpha>0 for -infinity

lvm · Posted 06-14-2011 02:57 PM

Dale makes a good point: it is best to avoid using parameter bounds, when possible. For your model, it should be possible to avoid explicit bounds, using the recommendations from Dale. With some nonlinear models, it can be challenging to do the optimization without using bounds, but always a good goal.

eberdan · Posted 06-14-2011 03:14 PM

Thank you all for your advice! I took Dale's advice and the model works. However, I am concerned that the hurdle part of the model is not working. My current code is not very different:

proc nlmixed data = main;
parms a0 = 0 a1 = 0 a2 =0 a3=0 b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 = 0 b7 = 0 log_alpha=0;
eta0 = a0 + a1 * male + a2 *female + a3 *mf;
etap = b0 + b1 * male + b2 * female + b3 * mf + b4 * salinity +
b5 * sm + b6 * sf + b7 *smf;
exp_eta0 = exp(eta0);
alpha = exp(log_alpha);
mu = exp(etap);
p0 = exp_eta0 / (1 + exp_eta0);
m= 1/alpha;
if court = 0 then ll = log(p0);
else ll = lgamma(court+m)-lgamma(court+1)-lgamma(m) + court*log(alpha*mu)-(court+m)*log(1+alpha*mu)- log(1 -( 1 + alpha*mu)**(-m));
model court ~ general(ll);
run;

which is almost identical to what I have before. But my parameter estimates come out like this:

Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower

a0 2.1240 2840.01 132 0.00 0.9994 0.05 -5615.70
a1 2.9631 2808.80 132 0.00 0.9992 0.05 -5553.13
a2 3.2490 2820.91 132 0.00 0.9991 0.05 -5576.80
a3 4.2013 2814.01 132 0.00 0.9988 0.05 -5562.20
b0 2.9783 4.3967 132 0.68 0.4993 0.05 -5.7188
b1 -0.06676 2.4206 132 -0.03 0.9780 0.05 -4.8549
b2 -0.2695 4.1330 132 -0.07 0.9481 0.05 -8.4451
b3 0.1722 2.1445 132 0.08 0.9361 0.05 -4.0698
b4 5.3092 0.3373 132 15.74 <.0001 0.05 4.6420
b5 -2.6839 0.1903 132 -14.10 <.0001 0.05 -3.0604
b6 -5.2397 0.3102 132 -16.89 <.0001 0.05 -5.8533
b7 2.6524 0.1629 132 16.28 <.0001 0.05 2.3301
log_alpha 0.2495 0.2317 132 1.08 0.2834 0.05 -0.2087

So the standard errors for the hurdle part of the model are huge! I can play around with the starting value of log_alpha and get them somewhat better but they are still huge. I also tried changing the hurdle model. Right now it is log((exp(zeromodel)) /(1+(exp(zeromodel))). Based on the SAS for mixed models by Littell et al's hurdle poisson I changed it to -exp(zeromodel) but that did not change the outcome.

-Emma

Hurdle model with negative binomial

Re: Hurdle model with negative binomial

Re: Hurdle model with negative binomial

Re: Hurdle model with negative binomial

Re: Hurdle model with negative binomial