Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- zero-inflated multinomial data

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted a month ago
(893 views)

Dear SAS Community,

I am trying to analyze multinomial dependent variables that have mostly zeroes. When I analyze these variables includíng the interactions of the factors in the model I am getting these warnings:

The negative of the Hessian is not positive definite. The convergence is questionable.

The procedure is continuing but the validity of the model fit is questionable

The specified model did not converge

However, if I only include the main factors in the model but not the interactions, then I no longer get the warnings.

Since I am only getting these warnings with those dependent variables that have too many zeroes, I am assuming it is due to zero-inflated data. It seems like genmod only has the option of zero-inflated data for Poisson or neg bin distributions, but not for multinomial data. I would greatly appreciate your feedback on this.

This is the code I am ussing (I also attached the data):

proc genmod data=one;

by Variety;

class Season Harvest Weeks;

model Easyofpeeling=Season| Harvest| Weeks /type3 dist=multinomial link=cumlogit;

run;

Thank you very much!

Caroline

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The FIRTH option is only available with binary response data. In sparse cases, another simplification besides removing model effects like interactions is to combine categories of any categorical variables. If the sparseness can be removed by doing this, then you might be able to estimate some interactions. How and in which variables to combine categories is a trial-and-error thing, but the biggest effect and place to start is with the multinomial response. Combining response categories to create fewer response levels does the most to reduce the number of parameters that must be estimated. If that isn't enough, then start combining categories in the predictors. Obviously the categories that have very few observations in some response levels are the ones to combine first.

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

An alternative to removing the interactions might be to fit ONLY the highest level interaction, and then use specific CONTRAST or ESTIMATE statements to calculate your odds ratios. This will also quickly identify the cells with small sample sizes.

SteveDenham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much for your input Steve. Since I think the problem is due to unbalanced data for some varieties I was still getting the warnings even when just including the main effects in the model.

Thank you

Caroline

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much for your comprehensive feedback StatDave! I simplified the model as much as I could in proc logistic. I think I now know what the main issue is. I followed your advice fitting the model for each variety separately and I noticed that for 'Hass' I dont get any warning because this variety was measured in every ocasion (balanced data), but if I run it for another variety that has very unbalanced data then I get these warnings:

There is possibly a quasi-complete separation of data points. The maximum likelihood

estimate may not exist.

The LOGISTIC procedure continues in spite of the above warning. Results shown are based

on the last maximum likelihood iteration. Validity of the model fit is questionable

So to me it seems like main problem is the unbalanced data for some of the varieties that were not harvested consistently like Hass which is the standard.

Question: Is it possible to use two where statements in proc logistic? So that I can fit the model for each variety and season.

proc logistic data=one;

where Variety= 'Hass';

class Season Harvest Weeks/param=glm;

model Easyofpeeling=Season Harvest Weeks/ link=glogit ;

run;

Thank you so much!

Caroline

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The lack of balance, meaning unequal numbers of observations in the various predictor combinations, is not itself a problem. It is the extreme case of this when some of the combinations have no observations. That is when "separation" occurs resulting in some parameters being infinite as I mentioned. In these cases, you might be able to fit the model with at least some interactions by using a penalized likelihood. This can be done by simply adding the FIRTH option in the MODEL statement.

Regarding your question - you don't need two WHERE statements because you can specify a single WHERE statement with multiple conditions such as: where varietey='Hass" and Season=2022;

Regarding your question - you don't need two WHERE statements because you can specify a single WHERE statement with multiple conditions such as: where varietey='Hass" and Season=2022;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much for your great suggestions StatDave. Some varieties are only harvested in month 1, 3, 4, 6 and 8, unlike Hass that is harvested almost every month, so fiiting the model for each variety and season, and just testing for main effects without interactions solved the problem. So I learned the lesson on symplifying the model as much as I can. Also, the firth option worked wonderfully for the binary response variables. Is there a similar option for multinomial data?

Thank you!

Caroline

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That makes a lot of sense, I will do so. Thank you so much StatDave!

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.