turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- ZINB in GENMOD vs NLMIXED

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-28-2016 03:25 PM - edited 03-28-2016 03:26 PM

I fit a zero-inflated negative binomial model in PROC GENMOD, using the following code (both Time and Treatment are binary indicator variables):

```
PROC GENMOD data=analysis_dataset;
class Time(ref="Baseline") Treatment(ref="Control") / param=ref;
model UAVI = Time|Treatment / dist=zinb offset=logTI;
zeromodel;
run;
```

This model converges and produces no errors.

If I try to fit the same model in PROC NLMIXED (note that "Interaction" here is an indicator variable I created in a datastep to signifiy the Time X Treatment interaction, since NLMIXED can't create interaction terms on its own):

```
PROC NLMIXED data=analysis_dataset method=gauss qpoints=25;
/* Set initial values for parameters */
parms a0=-1.1365 alpha=1
b0=-0.8222 b1=0.06963 b2=0.09920 b3=-0.2334
sigma=0.3131
;
/* Specificy zero model */
zeromodel = a0;
p0 = 1/(1+exp(-zeromodel));
/* Specify Negative Binomial model */
mu = exp(b0 + b1*Time + b2*Treatment+ b3*Interaction + logTI);
theta = 1/alpha;
p = 1/(1+alpha*mu);
/* Now build ZINB likelihood */
if UAVI=0 then
logLik = log(p0 + (1-p0)*exp(p**theta));
else
logLik = log(1-p0) + log(gamma(theta + UAVI)) - log(gamma(UAVI + 1)) - log(gamma(theta)) + theta*log(p) + UAVI*log(1-p);
model UAVI ~ general(logLik);
run;
```

So far as I can tell, this is the correct specification of the log-likelihood.

Now, this NLMIXED code produces errors. Specifically, it gives me 6 of the following errors:

**NOTE: Invalid argument to function GAMMA(#) at line # column #**

I used the NLMIXED code in a DATA step to identify the 6 problematic observations. It turns out that these six observations are all individuals whose counts of the outcome (UAVI) are all exceptionally high (>200). After fiddling around with some datasteps, I've determined that the GAMMA function will return this error if given as an argument any number greater than 171. Since, for positive integers, Gamma is calculated as (Integer-1)!, I'm guessing the error is because the numbers are too big. That is, Gamma(171)=7.257416e306!

Now, likely, these 6 observations will end up being excluded from my analysis because they are huge outliers, and I have some rather convincing evidence that the reasons they are outliers have more to do with data quality than any intrinsic effect of the treatment. But, the question I have is; **how is PROC GENMOD able to calculating values of the log-likelihood for these observations, if decomposing the likelihood function in PROC NLMIXED returns an error?** Clearly, something about PROC GENMOD's method of calculating the likelihood is different, because it is able to fit the model to those 6 datapoints with an error.

This is troublesome for me because ultimately I want to include random intercepts in my model, thus why I am fooling around with PROC NLMIXED instead of PROC GENMOD. However, it concerns me that in this situation GENMOD and NLMIXED are treating the data so differently. Unless I have made a major mistake in my PROC NLMIXED code above, what accounts for GENMOD's ability to fit those datapoints?

Accepted Solutions

Solution

03-29-2016
10:20 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RyanSimmons

03-29-2016 10:20 AM

Aha! I found the mistake. I mentally mixed up the log-likelihood expressions for the ZIP and ZINB models.

The code that reads as:

```
if UAVI=0 then
logLik = log(p0 + (1-p0)*exp(p**theta));
```

Should read as:

```
if UAVI=0 then
logLik = log(p0 + (1-p0)*(p**theta));
```

There is no exponentiation in this part of the log-likelihood (in contrast with the ZIP, which would have exp(-mu) in this term.

It's always the silly mistakes that undermine you!

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RyanSimmons

03-28-2016 06:17 PM

Try function lgamma() intead of log(gamma())

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

03-29-2016 09:28 AM

I didn't even realize lgamma() was a function in SAS! Always full of surprises ...

In any case, that removed the error I was experiencing. However, the parameter estimates are still different from those produced by the GENMOD call. The intercept for the zero-inflated model in particular is way off, with a value of -1.1399 given by GENMOD and a value of -18.9018 given by NLMIXED. Any idea what would account for this discrepency (incidentally, all the other parameter values are also different between GENMOD and NLMIXED, but not by much. The zero-inflation parameter is the only one that is radically off.)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RyanSimmons

03-29-2016 09:42 AM

Incidentally, I also modified the last line of code in NLMIXED before the model statement to read:

logLik = log(1-p0)+logpdf('NEGBINOMIAL',UAVI,p,theta)

This gives me the same parameter estimates as the code using lgamma() in place of the log(gamma()) calls. So I am confident that my log-likelihood is correctly specified. However, I don't understand why NLMIXED and GENMOD are giving different parameter estimates.

logLik = log(1-p0)+logpdf('NEGBINOMIAL',UAVI,p,theta)

This gives me the same parameter estimates as the code using lgamma() in place of the log(gamma()) calls. So I am confident that my log-likelihood is correctly specified. However, I don't understand why NLMIXED and GENMOD are giving different parameter estimates.

Solution

03-29-2016
10:20 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to RyanSimmons

03-29-2016 10:20 AM

Aha! I found the mistake. I mentally mixed up the log-likelihood expressions for the ZIP and ZINB models.

The code that reads as:

```
if UAVI=0 then
logLik = log(p0 + (1-p0)*exp(p**theta));
```

Should read as:

```
if UAVI=0 then
logLik = log(p0 + (1-p0)*(p**theta));
```

There is no exponentiation in this part of the log-likelihood (in contrast with the ZIP, which would have exp(-mu) in this term.

It's always the silly mistakes that undermine you!