BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RyanSimmons
Pyrite | Level 9

I fit a zero-inflated negative binomial model in PROC GENMOD, using the following code (both Time and Treatment are binary indicator variables):

 

PROC GENMOD data=analysis_dataset;
	class Time(ref="Baseline") Treatment(ref="Control") / param=ref;
	model UAVI = Time|Treatment / dist=zinb offset=logTI;
	zeromodel;
run;

This model converges and produces no errors. 

 

If I try to fit the same model in PROC NLMIXED (note that "Interaction" here is an indicator variable I created in a datastep to signifiy the Time X Treatment interaction, since NLMIXED can't create interaction terms on its own):

 

PROC NLMIXED data=analysis_dataset method=gauss qpoints=25;
	/* Set initial values for parameters */
	parms a0=-1.1365 alpha=1
		  b0=-0.8222 b1=0.06963 b2=0.09920 b3=-0.2334
		  sigma=0.3131
	;
	/* Specificy zero model */
	zeromodel = a0;
	p0 = 1/(1+exp(-zeromodel));
	/* Specify Negative Binomial model */
	mu = exp(b0 + b1*Time + b2*Treatment+ b3*Interaction + logTI);
	theta = 1/alpha;
	p = 1/(1+alpha*mu);
	/* Now build ZINB likelihood */
	if UAVI=0 then
		logLik = log(p0 + (1-p0)*exp(p**theta));
	else
		logLik = log(1-p0) + log(gamma(theta + UAVI)) - log(gamma(UAVI + 1)) - log(gamma(theta)) + theta*log(p) + UAVI*log(1-p);
	model UAVI ~ general(logLik);
run;

So far as I can tell, this is the correct specification of the log-likelihood. 

 

Now, this NLMIXED code produces errors. Specifically, it gives me 6 of the following errors:

 

NOTE: Invalid argument to function GAMMA(#) at line # column #

 

I used the NLMIXED code in a DATA step to identify the 6 problematic observations. It turns out that these six observations are all individuals whose counts of the outcome (UAVI) are all exceptionally high (>200). After fiddling around with some datasteps, I've determined that the GAMMA function will return this error if given as an argument any number greater than 171. Since, for positive integers, Gamma is calculated as (Integer-1)!, I'm guessing the error is because the numbers are too big. That is, Gamma(171)=7.257416e306! 

 

Now, likely, these 6 observations will end up being excluded from my analysis because they are huge outliers, and I have some rather convincing evidence that the reasons they are outliers have more to do with data quality than any intrinsic effect of the treatment. But, the question I have is; how is PROC GENMOD able to calculating values of the log-likelihood for these observations, if decomposing the likelihood function in PROC NLMIXED returns an error? Clearly, something about PROC GENMOD's method of calculating the likelihood is different, because it is able to fit the model to those 6 datapoints with an error. 

 

This is troublesome for me because ultimately I want to include random intercepts in my model, thus why I am fooling around with PROC NLMIXED instead of PROC GENMOD. However, it concerns me that in this situation GENMOD and NLMIXED are treating the data so differently. Unless I have made a major mistake in my PROC NLMIXED code above, what accounts for GENMOD's ability to fit those datapoints?

1 ACCEPTED SOLUTION

Accepted Solutions
RyanSimmons
Pyrite | Level 9

Aha! I found the mistake. I mentally mixed up the log-likelihood expressions for the ZIP and ZINB models. 

 

The code that reads as:

 

	if UAVI=0 then
		logLik = log(p0 + (1-p0)*exp(p**theta));

Should read as: 


	if UAVI=0 then
		logLik = log(p0 + (1-p0)*(p**theta));

There is no exponentiation in this part of the log-likelihood (in contrast with the ZIP, which would have exp(-mu) in this term.

 

It's always the silly mistakes that undermine you!

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

Try function lgamma() intead of log(gamma())

PG
RyanSimmons
Pyrite | Level 9

I didn't even realize lgamma() was a function in SAS! Always full of surprises ...

 

In any case, that removed the error I was experiencing. However, the parameter estimates are still different from those produced by the GENMOD call. The intercept for the zero-inflated model in particular is way off, with a value of -1.1399 given by GENMOD and a value of -18.9018 given by NLMIXED. Any idea what would account for this discrepency (incidentally, all the other parameter values are also different between GENMOD and NLMIXED, but not by much. The zero-inflation parameter is the only one that is radically off.)

RyanSimmons
Pyrite | Level 9
Incidentally, I also modified the last line of code in NLMIXED before the model statement to read:

logLik = log(1-p0)+logpdf('NEGBINOMIAL',UAVI,p,theta)

This gives me the same parameter estimates as the code using lgamma() in place of the log(gamma()) calls. So I am confident that my log-likelihood is correctly specified. However, I don't understand why NLMIXED and GENMOD are giving different parameter estimates.
RyanSimmons
Pyrite | Level 9

Aha! I found the mistake. I mentally mixed up the log-likelihood expressions for the ZIP and ZINB models. 

 

The code that reads as:

 

	if UAVI=0 then
		logLik = log(p0 + (1-p0)*exp(p**theta));

Should read as: 


	if UAVI=0 then
		logLik = log(p0 + (1-p0)*(p**theta));

There is no exponentiation in this part of the log-likelihood (in contrast with the ZIP, which would have exp(-mu) in this term.

 

It's always the silly mistakes that undermine you!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2212 views
  • 4 likes
  • 2 in conversation