Solved: Re: Count data with normal residuals distribution

palolix · Posted 10-16-2025 02:48 PM

Dear SAS Community,

If the residuals of my data are normally distributed (as you can see in this graph) but my outcome variable is count data (days to ripe for avocados), should I use a normal distribution if using genmod to analyze the data? If the residuals have a Poisson distribution I would then use dist=Poisson and link=log, and if there is overdispersion in my data I would then use dist=negbin and link=log. Is this approach reasonable?

proc glm data=one plots=diagnostics plots(maxpoints=10000);
class Season Harvest Variety Wks;
model DTR=Season|Harvest|Variety|Wks;
quit;

Thank you very much

StatDave · Posted 10-16-2025 03:56 PM

Look at the plot of predicted by residual values. That shows the classic fan shape indicating nonconstant variance which suggests that the normal distribution is not appropriate. And if the response is a count, then it is discrete and it would make more sense to use an appropriate discrete distribution like the Poisson or negative binomial as you suggest. But yes, if you try the Poisson and see evidence of overdispersion, then the negative binomial is the most typical alternative to try, but there are other alternatives such as the ones discussed in this note. See also the Overdispersion section in this note.

View solution in original post

StatDave · Posted 10-16-2025 03:56 PM

Look at the plot of predicted by residual values. That shows the classic fan shape indicating nonconstant variance which suggests that the normal distribution is not appropriate. And if the response is a count, then it is discrete and it would make more sense to use an appropriate discrete distribution like the Poisson or negative binomial as you suggest. But yes, if you try the Poisson and see evidence of overdispersion, then the negative binomial is the most typical alternative to try, but there are other alternatives such as the ones discussed in this note. See also the Overdispersion section in this note.

palolix · Posted 10-16-2025 04:10 PM

Thank you so much for your quick reply. That was very helpful! So if I go with Poisson or an alternative distribution to address overdispersion, can I still use means to compare between varieties like it is shown in the graph?

proc genmod data=one;
where Season=2;
class Harvest Variety;
model DTR=Harvest|Variety/type3 dist=poisson link=log;
slice Harvest*Variety/sliceby=Harvest diff adjust=simulate(seed=1);
run;

StatDave · Posted 10-16-2025 04:29 PM

Yes, you can use the LSMEANS, SLICE, or LSMESTIMATE statement to make comparisons. With the ILINK option these will provide estimates of the Poisson mean for the individual levels. With the EXP option, the table of differences provides estimates of the ratios of Poisson means comparing two levels at a time.

palolix · Posted 10-16-2025 06:26 PM

Thanks for your new reply StatDave. Can I still use the original means to make graphs and report results or is it more appropriate to use the means I will get using the ilink option?

I tried including the ilink option like this but I think it is not correct

proc genmod data=one;
where Season=2;
class Harvest Variety;
model DTR=Harvest|Variety/type3 dist=poisson link=log;
slice Harvest*Variety/sliceby=Harvest ilink diff adjust=simulate(seed=1);
run;

StatDave · Posted 10-16-2025 09:18 PM

As I mentioned, the ILINK option applies to the individual means which the SLICE statement (unlike the LSMEANS statement) does not provide unless you also specify the MEANS option.

It depends on what you want to do, but I would think that once you have a model that you feel fits the data reasonably, it is that model that provides all of the inferences about mean estimates and differences, so it is the model-based results you would show that correspond to those estimates and inferences.

palolix · Posted 10-17-2025 08:04 PM

Ok, thank you very much for all your help on this!