I am fitting a ZINB model in PROC GENMOD. I have noticed some odd behavior with the way the model is fit with respect to the offset term that I was hoping people here could help clarify.
The outcome of interest in the study are counts of acts of unprotected sex. I am offsetting this outcome in the ZINB model by the total count of sex acts. To put it another way, the outcome is the rate or proportion of unprotected sex. There is a non-trivial subset (about 77 individuals) of my study population (N=474), however, that had no sex at all during the specified time frame. That is, both their outcome AND their offset term are equal to 0.
First, I fit a model where those 77 individuals had their outcomes set to missing (and they were thus excluded from the analysis). The code looks like this (where TI is the offset variable of interest, and UAVI is the outcome variable of interest):
DATA test1;
set dat_analysis;
logTI = log(TI);
run;
/* SAS sets values of logT to missing where TI=0 */
PROC GENMOD data=test1;
class Treatment(ref="Control") / param=ref;
model UAVI = Treatment / dist=zinb offset=logTI;
zeromodel;
run;
Here is some output from that model:
Number of Observations Read: 474
Number of Observations Used: 396
Missing Values: 78
Intercept Estimate = -0.6249
Intercept St. Err: 0.0619
Treatment Estimate = 0.0915
Treatment St. Err: 0.0865
I then ran the following code, where I manually set the offset terms to 0 instead of missing:
DATA test4;
set dat_analysis;
if TI>0 then logTI=log(TI); if TI=0 then logTI=0;
run;
PROC GENMOD data=test4;
class Treatment(ref="Control") / param=ref;
model UAVI = Treatment / dist=zinb offset=logTI;
zeromodel;
run;
In this case, my results are slightly different. Some output:
Number of Observations Read: 474
Number of Observations Used: 473
Missing Values: 1
Intercept Estimate = -0.6554
Intercept St. Err: 0.0623
Treatment Estimate = 0.0713
Treatment St. Err: 0.0855
Now, why are these results different? If the offset term is set to 0, then those individuals have a rate of 0/0. I would think that SAS would ignore those cases, because it makes no mathematical sense, but clearly SAS IS incorporating that information into the model. But how? What is SAS doing, here?
Then, I fit two more models where I imputed a value for the offset term. One of the models I replaced each 0 of the offset with a very small value (0.001) and the other model I replaced it with a large value (1000). Since all individuals with a 0 offset also had a 0 on the outcome of interest (by definition), I figured that the value of the offset would be irrelevant, and I would get analagous results. However, this turned out to be incorrect:
DATA test2;
set dat_analysis;
if TI=0 then TI=0.001; logTI = log(TI);
run;
/* MODEL 1 */
PROC GENMOD data=test2;
class Treatment(ref="Control") / param=ref;
model UAVI = Treatment / dist=zinb offset=logTI;
zeromodel;
run; DATA test3; set dat_analysis; if TI=0 then TI=1000; logTI = log(TI); run; /* MODEL 2 */ PROC GENMOD data=test3; class Treatment(ref="Control") / param=ref; model UAVI = Treatment / dist=zinb offset=logTI; zeromodel; run;
Number of Observations Read: 474
Number of Observations Used: 473
Missing Values: 1
MODEL 1:
Intercept Estimate = -0.6250
Intercept St. Err: 0.0619
Treatment Estimate = 0.0915
Treatment St. Err: 0.0864 MODEL 2: Intercept Estimate = -0.5763 Intercept St. Err: 0.0586 Treatment Estimate = 0.0815 Treatment St. Err: 0.0831
Now, admittedly, the differences between the models are small, but I don't understand why there are differences at all. In all of these cases, the individuals whose offset terms have been modified have 0 outcomes, so in all cases they are being modelled with a rate of 0. So shouldn't these models all be equivalent/
For the model in which I set the 0 offsets to a 0.001, the results are equivalent to the model that ignored those observations entirely (i.e. they were set to missing). The cases where I set the log offset to 0 manually and where I gave the 0 offsets a value of 1000 each gave different results from either of the other models.
How can we explain these results? And what is the most principled method for dealing with this case with 0 offsets?
... View more