BookmarkSubscribeRSS Feed
palolix
Quartz | Level 8

Dear SAS Community,

I am trying to analyze multinomial dependent variables that have mostly zeroes. When I analyze these variables includíng the interactions of the factors in the model I am getting these warnings: 

The negative of the Hessian is not positive definite. The convergence is questionable.

The procedure is continuing but the validity of the model fit is questionable

The specified model did not converge

 

However, if I only include the main factors in the model but not the interactions, then I no longer get the warnings.

 

Since I am only getting these warnings with those dependent variables that have too many zeroes, I am assuming it is due to zero-inflated data. It seems like genmod only has the option of zero-inflated data for Poisson or  neg bin distributions, but not for multinomial data. I would greatly appreciate your feedback on this.

 

This is the code I am ussing (I also attached the data):

proc genmod data=one;
by Variety;
class Season Harvest Weeks;
model Easyofpeeling=Season| Harvest| Weeks /type3 dist=multinomial link=cumlogit;
run;

 

 

Thank you very much!

Caroline

1 REPLY 1
StatDave
SAS Super FREQ

The problem is not the zero values in the response variable. For a multinomial categorical variable, zero is just another category and the distribution does not restrict the proportion of zeros like with, say, the continuous gamma distribution. The problem here is that you have several response categories and also specify all possible interactions resulting in a complex model with many parameters to be estimated and the model complexity makes the data in each variety too sparse. The result, just like for binary logistic models, is that some model parameters are actually infinite. Since computers don't deal in infinities, the practical result is that some parameters are large with even larger standard errors and/or some parameters have zero degrees of freedom and are not estimated. The solution is to simplify the model in any way acceptable to you such as by combining response levels (which will have the biggest benefit) and/or removing some or all interactions. Because the amount of data varies so much by variety, you should not expect that you can just specify one model and use BY VARIETY and get a proper fit for each variety unless you find a much simpler model that can be successfully fit in every variety. And by the way, for logistic models like this (binary or multinomial), PROC LOGISTIC is the better procedure to use than PROC GENMOD as it is more specialized for those models.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 46 views
  • 0 likes
  • 2 in conversation