BookmarkSubscribeRSS Feed
RyanSimmons
Pyrite | Level 9

I am currently fitting a ZINB model in PROC GENMOD. I actually posted a different question related to the same model here; I am keeping this question separate because it is a different issue than what I raised in the other thread and think it will be easier to keep them separate.

 

Anyway, say I have the following model in PROC GENMOD:

 

PROC GENMOD data=dat_analysis;

class Treatment(ref="Control") / param=ref;

model UAVI = Treatment / dist=zinb offset=logTI;

zeromodel;

run;

 

Specifically, you can see that I fit a null (i.e. intercept only) zeromodel to this data. The output for the zero inflation parameter is:

 

Intercept Estimate: -1.3971

Intercept St. Err: 0.1667

 

According to the SAS documentation, this parameter is modelled as:

 

 

Where h is the logit link function, and in this instance the right hand side of the equation consists solely of the intercept.

 

However, if I create a binary indicator variable coding 0s and non-zeros, and run a logistic regression on that outcome with no covaraites, I get completely different results. So, using the following code:

 

DATA test;

set dat_analysis;

if UAVI=0 then zero=1; else zero=0;

run;

 

PROC LOGISTIC data=test;

model zero(event='1') = ;

run;

 

Then, the parameter estimate for the probaility of a zero is:

 

Intercept Estimate: -0.7621

Intercept St. Err: 0.1079

 

As you can see, these are radically different results. What accounts for this?

 

On a related note, I've also noticed that the results for the zero inflation parameter change when I add covariates to the MODEL statement in PROC GENMOD, even if the zeromodel is still specified as null, as in the following:

 

PROC GENMOD data=dat_analysis;

class Treatment(ref="Control") / param=ref;

model UAVI = Treatment T6 / dist=zinb offset=logTI;

zeromodel;

run;

 

Now, the parameter estimate are:

 

Intercept Estimate: -1.3943

Intercept St. Err: 0.1663

 

But why would the results change? I've checked that the issue isn't related to missing values (i.e. all of these models are using the same exact pool of individuals). The parameters in the model statement shouldn't impact the fit of the zero model, since it is still a null model fit to the same number of subjects. And I don't understand why the PROC LOGISTIC model gives different results, when the documentation indicates that the method used in GENMOD is equivalent.

5 REPLIES 5
RyanSimmons
Pyrite | Level 9

As a follow-up to the original post, I have found some elucidation in using PROC FMM. If I fit the following two models:

 

PROC FMM data=dat_analysis;

class Treatment;

model UAVI = Treatment / dist=truncnegbin offset=logTI;

model + / dist=Constant;

run;

 

PROC FMM data=analysis;

class Treatment;

model UAVI = Treatment / dist=negbinomial offset=logTI;

model + / dist=Constant;

run;

 

The first model (which is a negative binomial hurdle model) gives me the exact same estimated zero probability parameter as PROC LOGISTIC.The second model agrees completely with the output of PROC GENMOD's ZINB fit. So the answer lies in the difference between these two models, somewhere, but as of yet I have not been able to figure out the cause of the difference or its ramifications.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The zinb model in GENMOD is not a hurdle model, but a more general mixture model. The defined model includes the zero term; then you have the added zero component term (a distribution defined only at 0). The probability of 0 is related to the sum of the probabilities from the model statement and the zero term statement. If you fit just a null model, you would not get the same results.

 

In your first FMM run, by using a trucated NB, you are not getting the probability of 0 for the first component. Thus, it is a different model.

RyanSimmons
Pyrite | Level 9

Isn't it the other way around, though? The results from my truncated negative binomial model AGREE with the results of my null model, whereas the results of the zero-inflated model DISAGREE with the results of the null model.

 

The SAS documentation explicitly claims that the zero-probabilities calculated for a zero-inflated model are using a logistic regression, but the results of fitting that regression produce incompatible results. Those results only correspond to the zero-probabilities calculated by the hurdle model. So there appears to be a discrepency between the way the documentation claims the model works versus how it is working in practice.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I guess I disagree. I think your results make perfect sense. THe results of fitting the mixture model will certainly disagree with the fit from the null model (because both portions of the mixture are giving part of the zero prediction).

 

RyanSimmons
Pyrite | Level 9

Then the SAS documentation is incorrect? This is where my confusion lies; the way it is described in SAS implies that it SHOULD agree with the null model when it clearly does NOT. That is why I am asking for clarification on the issue. The SAS documentation implies that using the "zeromodel" statement with no specified effects is equivalent to fitting the null model, which clearly isn't the case.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2056 views
  • 1 like
  • 2 in conversation