Dear SAS Community,
If the residuals of my data are normally distributed (as you can see in this graph) but my outcome variable is count data (days to ripe for avocados), should I use a normal distribution if using genmod to analyze the data? If the residuals have a Poisson distribution I would then use dist=Poisson and link=log, and if there is overdispersion in my data I would then use dist=negbin and link=log. Is this approach reasonable?
proc glm data=one plots=diagnostics plots(maxpoints=10000);
class Season Harvest Variety Wks;
model DTR=Season|Harvest|Variety|Wks;
quit;
Thank you very much
Look at the plot of predicted by residual values. That shows the classic fan shape indicating nonconstant variance which suggests that the normal distribution is not appropriate. And if the response is a count, then it is discrete and it would make more sense to use an appropriate discrete distribution like the Poisson or negative binomial as you suggest. But yes, if you try the Poisson and see evidence of overdispersion, then the negative binomial is the most typical alternative to try, but there are other alternatives such as the ones discussed in this note. See also the Overdispersion section in this note.
Look at the plot of predicted by residual values. That shows the classic fan shape indicating nonconstant variance which suggests that the normal distribution is not appropriate. And if the response is a count, then it is discrete and it would make more sense to use an appropriate discrete distribution like the Poisson or negative binomial as you suggest. But yes, if you try the Poisson and see evidence of overdispersion, then the negative binomial is the most typical alternative to try, but there are other alternatives such as the ones discussed in this note. See also the Overdispersion section in this note.
Thank you so much for your quick reply. That was very helpful! So if I go with Poisson or an alternative distribution to address overdispersion, can I still use means to compare between varieties like it is shown in the graph?
proc genmod data=one;
where Season=2;
class Harvest Variety;
model DTR=Harvest|Variety/type3 dist=poisson link=log;
slice Harvest*Variety/sliceby=Harvest diff adjust=simulate(seed=1);
run;
Thanks for your new reply StatDave. Can I still use the original means to make graphs and report results or is it more appropriate to use the means I will get using the ilink option?
I tried including the ilink option like this but I think it is not correct
proc genmod data=one;
where Season=2;
class Harvest Variety;
model DTR=Harvest|Variety/type3 dist=poisson link=log;
slice Harvest*Variety/sliceby=Harvest ilink diff adjust=simulate(seed=1);
run;
Ok, thank you very much for all your help on this!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.