BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
palolix
Lapis Lazuli | Level 10

Dear SAS Community, 

 

If  the residuals of my data are normally distributed (as you can see in this graph) but my outcome variable is count data (days to ripe for avocados), should I use a normal distribution if using genmod to analyze the data? If the residuals have a Poisson distribution I would then use dist=Poisson and link=log, and if there is overdispersion in my data I would then use dist=negbin and link=log. Is this approach reasonable?

 

proc glm data=one plots=diagnostics plots(maxpoints=10000);
class Season Harvest Variety Wks;
model DTR=Season|Harvest|Variety|Wks;
quit;

 

palolix_0-1760639139944.png

 

Thank you very much

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Look at the plot of predicted by residual values. That shows the classic fan shape indicating nonconstant variance which suggests that the normal distribution is not appropriate. And if the response is a count, then it is discrete and it would make more sense to use an appropriate discrete distribution like the Poisson or negative binomial as you suggest. But yes, if you try the Poisson and see evidence of overdispersion, then the negative binomial is the most typical alternative to try, but there are other alternatives such as the ones discussed in this note. See also the Overdispersion section in this note

View solution in original post

6 REPLIES 6
StatDave
SAS Super FREQ

Look at the plot of predicted by residual values. That shows the classic fan shape indicating nonconstant variance which suggests that the normal distribution is not appropriate. And if the response is a count, then it is discrete and it would make more sense to use an appropriate discrete distribution like the Poisson or negative binomial as you suggest. But yes, if you try the Poisson and see evidence of overdispersion, then the negative binomial is the most typical alternative to try, but there are other alternatives such as the ones discussed in this note. See also the Overdispersion section in this note

palolix
Lapis Lazuli | Level 10

Thank you so much for your quick reply. That was very helpful! So if I go with Poisson or an alternative distribution to address overdispersion, can I still use means to compare between varieties like it is shown in the graph?

 

proc genmod data=one;
where Season=2;
class Harvest Variety;
model DTR=Harvest|Variety/type3 dist=poisson link=log;
slice Harvest*Variety/sliceby=Harvest  diff adjust=simulate(seed=1);
run;

 

palolix_0-1760645337673.png

 

 

StatDave
SAS Super FREQ
Yes, you can use the LSMEANS, SLICE, or LSMESTIMATE statement to make comparisons. With the ILINK option these will provide estimates of the Poisson mean for the individual levels. With the EXP option, the table of differences provides estimates of the ratios of Poisson means comparing two levels at a time.
palolix
Lapis Lazuli | Level 10

Thanks for your new reply StatDave. Can I still use the original means to make graphs and report results or is it more appropriate to use the means I will get using the ilink option? 

 

I tried including the ilink option like this but I think it is not correct

 

proc genmod data=one;
where Season=2;
class Harvest Variety;
model DTR=Harvest|Variety/type3 dist=poisson link=log;
slice Harvest*Variety/sliceby=Harvest ilink diff adjust=simulate(seed=1);
run;

StatDave
SAS Super FREQ
As I mentioned, the ILINK option applies to the individual means which the SLICE statement (unlike the LSMEANS statement) does not provide unless you also specify the MEANS option.

It depends on what you want to do, but I would think that once you have a model that you feel fits the data reasonably, it is that model that provides all of the inferences about mean estimates and differences, so it is the model-based results you would show that correspond to those estimates and inferences.
palolix
Lapis Lazuli | Level 10

Ok, thank you very much for all your help on this!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 210 views
  • 3 likes
  • 2 in conversation