zero-inflated multinomial data

palolix

Dear SAS Community,

I am trying to analyze multinomial dependent variables that have mostly zeroes. When I analyze these variables includíng the interactions of the factors in the model I am getting these warnings:

The negative of the Hessian is not positive definite. The convergence is questionable.

The procedure is continuing but the validity of the model fit is questionable

The specified model did not converge

However, if I only include the main factors in the model but not the interactions, then I no longer get the warnings.

Since I am only getting these warnings with those dependent variables that have too many zeroes, I am assuming it is due to zero-inflated data. It seems like genmod only has the option of zero-inflated data for Poisson or neg bin distributions, but not for multinomial data. I would greatly appreciate your feedback on this.

This is the code I am ussing (I also attached the data):

proc genmod data=one;
by Variety;
class Season Harvest Weeks;
model Easyofpeeling=Season| Harvest| Weeks /type3 dist=multinomial link=cumlogit;
run;

Thank you very much!

Caroline

StatDave

The problem is not the zero values in the response variable. For a multinomial categorical variable, zero is just another category and the distribution does not restrict the proportion of zeros like with, say, the continuous gamma distribution. The problem here is that you have several response categories and also specify all possible interactions resulting in a complex model with many parameters to be estimated and the model complexity makes the data in each variety too sparse. The result, just like for binary logistic models, is that some model parameters are actually infinite. Since computers don't deal in infinities, the practical result is that some parameters are large with even larger standard errors and/or some parameters have zero degrees of freedom and are not estimated. The solution is to simplify the model in any way acceptable to you such as by combining response levels (which will have the biggest benefit) and/or removing some or all interactions. Because the amount of data varies so much by variety, you should not expect that you can just specify one model and use BY VARIETY and get a proper fit for each variety unless you find a much simpler model that can be successfully fit in every variety. And by the way, for logistic models like this (binary or multinomial), PROC LOGISTIC is the better procedure to use than PROC GENMOD as it is more specialized for those models.

zero-inflated multinomial data

Re: zero-inflated multinomial data

SAS Innovate 2025: Call for Content