Hi All,
I have a question confusing me. The model is generalized linear model with negative binomial distribution. My data are insect counts from 10 dates and 7 locations. The counts are all zeros on 7/16/21. My question is that why the standard error on 7/16/21 is the greatest one, 484.43? As you can see the data scale standard error is still close to 0, which is consistent to the raw data. Another question is that the mean separation is based on the estimates on model scale, then is the mean separation still correct (0 variance on raw data, but greatest variance on model scaled values)?
proc glimmix data=two plots=residualpanel method=quad;
by year;
class date field_id;
model coe_female = date field_id/dist=NB;
nloptions tech=nrridg;
lsmeans date/adjust=Tukey lines ilink plots=meanplot(join);
run;
Thanks a lot!
Rosie
Is there anyone can give me a hint or insights of the confusing result? Any thought will be appreciated! @Rick_SAS , @StatDave_sas.
Your question seems to be "Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM."
In general, I don't think it does. It depends on the data. For your data, the reference category is the last category of the DATE variable. The mean of the data in the reference level is 0.86. In contrast, the mean of the values in the DATE='16JUL2021'd is -13.7. This is far away from the reference mean, which is why the variance of the difference between the mean is so large.
So your result is not because the counts for '16JUL2021'd are a constant (0), but because the constant is far from the average count for the reference category.
When I checked the estimated parameters, the stdr is 484.43 of 7/16/2021. The stdr of differences of DATA means when comparing 7/16/2021 to other dates, they are always 484.43. Because of the huge stdr, it is difficult to detect any significant difference between the counts on 7/16/2021 to other dates. Do you know why? Thanks!
I suspect that the Hessian matrix is nearly singular - the zero count for that date isn't handled well by the log link. I would be very tempted to remove the 7/16/21 records, if in fact there are nothing but zeroes observed then. Also, since you have no RANDOM effects in your model, you may wish to do the analysis in PROC GENMOD, where you at least have the option of fitting a zero-inflated model.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.