Programming the statistical procedures from SAS

ZINB in GENMOD vs NLMIXED

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 98
Accepted Solution

ZINB in GENMOD vs NLMIXED

[ Edited ]

I fit a zero-inflated negative binomial model in PROC GENMOD, using the following code (both Time and Treatment are binary indicator variables):

 

PROC GENMOD data=analysis_dataset;
	class Time(ref="Baseline") Treatment(ref="Control") / param=ref;
	model UAVI = Time|Treatment / dist=zinb offset=logTI;
	zeromodel;
run;

This model converges and produces no errors. 

 

If I try to fit the same model in PROC NLMIXED (note that "Interaction" here is an indicator variable I created in a datastep to signifiy the Time X Treatment interaction, since NLMIXED can't create interaction terms on its own):

 

PROC NLMIXED data=analysis_dataset method=gauss qpoints=25;
	/* Set initial values for parameters */
	parms a0=-1.1365 alpha=1
		  b0=-0.8222 b1=0.06963 b2=0.09920 b3=-0.2334
		  sigma=0.3131
	;
	/* Specificy zero model */
	zeromodel = a0;
	p0 = 1/(1+exp(-zeromodel));
	/* Specify Negative Binomial model */
	mu = exp(b0 + b1*Time + b2*Treatment+ b3*Interaction + logTI);
	theta = 1/alpha;
	p = 1/(1+alpha*mu);
	/* Now build ZINB likelihood */
	if UAVI=0 then
		logLik = log(p0 + (1-p0)*exp(p**theta));
	else
		logLik = log(1-p0) + log(gamma(theta + UAVI)) - log(gamma(UAVI + 1)) - log(gamma(theta)) + theta*log(p) + UAVI*log(1-p);
	model UAVI ~ general(logLik);
run;

So far as I can tell, this is the correct specification of the log-likelihood. 

 

Now, this NLMIXED code produces errors. Specifically, it gives me 6 of the following errors:

 

NOTE: Invalid argument to function GAMMA(#) at line # column #

 

I used the NLMIXED code in a DATA step to identify the 6 problematic observations. It turns out that these six observations are all individuals whose counts of the outcome (UAVI) are all exceptionally high (>200). After fiddling around with some datasteps, I've determined that the GAMMA function will return this error if given as an argument any number greater than 171. Since, for positive integers, Gamma is calculated as (Integer-1)!, I'm guessing the error is because the numbers are too big. That is, Gamma(171)=7.257416e306! 

 

Now, likely, these 6 observations will end up being excluded from my analysis because they are huge outliers, and I have some rather convincing evidence that the reasons they are outliers have more to do with data quality than any intrinsic effect of the treatment. But, the question I have is; how is PROC GENMOD able to calculating values of the log-likelihood for these observations, if decomposing the likelihood function in PROC NLMIXED returns an error? Clearly, something about PROC GENMOD's method of calculating the likelihood is different, because it is able to fit the model to those 6 datapoints with an error. 

 

This is troublesome for me because ultimately I want to include random intercepts in my model, thus why I am fooling around with PROC NLMIXED instead of PROC GENMOD. However, it concerns me that in this situation GENMOD and NLMIXED are treating the data so differently. Unless I have made a major mistake in my PROC NLMIXED code above, what accounts for GENMOD's ability to fit those datapoints?


Accepted Solutions
Solution
‎03-29-2016 10:20 AM
Frequent Contributor
Posts: 98

Re: ZINB in GENMOD vs NLMIXED

Aha! I found the mistake. I mentally mixed up the log-likelihood expressions for the ZIP and ZINB models. 

 

The code that reads as:

 

	if UAVI=0 then
		logLik = log(p0 + (1-p0)*exp(p**theta));

Should read as: 


	if UAVI=0 then
		logLik = log(p0 + (1-p0)*(p**theta));

There is no exponentiation in this part of the log-likelihood (in contrast with the ZIP, which would have exp(-mu) in this term.

 

It's always the silly mistakes that undermine you!

View solution in original post


All Replies
Respected Advisor
Posts: 4,756

Re: ZINB in GENMOD vs NLMIXED

Try function lgamma() intead of log(gamma())

PG
Frequent Contributor
Posts: 98

Re: ZINB in GENMOD vs NLMIXED

I didn't even realize lgamma() was a function in SAS! Always full of surprises ...

 

In any case, that removed the error I was experiencing. However, the parameter estimates are still different from those produced by the GENMOD call. The intercept for the zero-inflated model in particular is way off, with a value of -1.1399 given by GENMOD and a value of -18.9018 given by NLMIXED. Any idea what would account for this discrepency (incidentally, all the other parameter values are also different between GENMOD and NLMIXED, but not by much. The zero-inflation parameter is the only one that is radically off.)

Frequent Contributor
Posts: 98

Re: ZINB in GENMOD vs NLMIXED

Incidentally, I also modified the last line of code in NLMIXED before the model statement to read:

logLik = log(1-p0)+logpdf('NEGBINOMIAL',UAVI,p,theta)

This gives me the same parameter estimates as the code using lgamma() in place of the log(gamma()) calls. So I am confident that my log-likelihood is correctly specified. However, I don't understand why NLMIXED and GENMOD are giving different parameter estimates.
Solution
‎03-29-2016 10:20 AM
Frequent Contributor
Posts: 98

Re: ZINB in GENMOD vs NLMIXED

Aha! I found the mistake. I mentally mixed up the log-likelihood expressions for the ZIP and ZINB models. 

 

The code that reads as:

 

	if UAVI=0 then
		logLik = log(p0 + (1-p0)*exp(p**theta));

Should read as: 


	if UAVI=0 then
		logLik = log(p0 + (1-p0)*(p**theta));

There is no exponentiation in this part of the log-likelihood (in contrast with the ZIP, which would have exp(-mu) in this term.

 

It's always the silly mistakes that undermine you!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 357 views
  • 4 likes
  • 2 in conversation