Hi all,

I'm using ZINB models to examine count data in 44 youth who are part of a summer research treatment program. The dataset is highly unbalanced as some youth have many time-outs, and others have only a few. Given this, I'm including an offset variable to account for the # of time-outs a child got within a day. Given the overdispersion and excess zeros, ZINB appears most appropriate and I'm using a LINK = LOG function. Does my offset variable also need to be logged? I'm seeing mixed responses online. My code is below.

Thanks much!

PROC GENMOD DATA = TIMEOUT;
CLASS Meds;
MODEL Behavior = time X1 X2 X3 X4 / link = log dist = zinb; offset = timeout_withindaycount;(does this need to log transformed?)
ZEROMODEL time X1 X2 X3 X4;
RUN;

The purpose of using an offset in a model on a count response is so that you can model a rate - the ratio of the mean count to some exposure or population amount. See this note that discusses it. As shown there, in a log-linked model the denominator variable of the desired rate should also be log transformed. Assuming that your BEHAVIOR variable is a count and if your intent is to model the rate defined by BEHAVIOR/TIMEOUT_WITHINDAYCOUNT, then you need to log-transform the timeout variable. See this note that discusses and illustrates modeling mean counts and rates with zero-inflated models.

2 REPLIES 2
Apologies for the delay! Thank you very much, StatDave. This was very helpful.
