BookmarkSubscribeRSS Feed
RosieSAS
Obsidian | Level 7

Hi All,

I have a question confusing me. The model is generalized linear model with negative binomial distribution. My data are insect counts from 10 dates and 7 locations. The counts are all zeros on 7/16/21. My question is that why the standard error on 7/16/21 is the greatest one, 484.43? As you can see the data scale standard error is still close to 0, which is consistent to the raw data. Another question is that the mean separation is based on the estimates on model scale, then is the mean separation still correct (0 variance on raw data, but greatest variance on model scaled values)? 

proc glimmix data=two plots=residualpanel method=quad;
  by year;
  class date field_id;
  model coe_female = date field_id/dist=NB;
  nloptions tech=nrridg;
  lsmeans date/adjust=Tukey lines ilink plots=meanplot(join); 
run;

RosieSAS_0-1654286924454.png

Thanks a lot!

Rosie

5 REPLIES 5
RosieSAS
Obsidian | Level 7

Is there anyone can give me a hint or insights of the confusing result? Any thought will be appreciated! @Rick_SAS , @StatDave_sas.

Rick_SAS
SAS Super FREQ

Your question seems to be "Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM."

 

In general, I don't think it does. It depends on the data. For your data, the reference category is the last category of the DATE variable. The mean of the data in the reference level is 0.86.  In contrast, the mean of the values in the DATE='16JUL2021'd is -13.7. This is far away from the reference mean, which is why the variance of the difference between the mean is so large.

 

So your result is not because the counts for '16JUL2021'd are a constant (0), but because the constant is far from the average count for the reference category.

 

RosieSAS
Obsidian | Level 7
Thanks for your reply, @Rick_SAS. I can understand why the mean value on 7/16/2021 is the smallest, -13.7, but not the standard error. Do you mean the standard error, 484.43, reflects the variance of the difference between the mean on 7/16/2021 and on 09/03/2021? That is totally different from my understanding that the standard errors in LS-mean table are the estimated standard errors of each particular level of the means on model scale, not relative values depending on the reference level. Then why the standard error from ILINK option on raw data scale is not the greatest anymore?
RosieSAS
Obsidian | Level 7

When I checked the estimated parameters, the stdr is 484.43 of 7/16/2021. The stdr of differences of DATA means when comparing 7/16/2021 to other dates, they are always 484.43. Because of the huge stdr, it is difficult to detect any significant difference between the counts on 7/16/2021 to other dates. Do you know why? Thanks!

 

RosieSAS_0-1654608131918.png

RosieSAS_1-1654608159445.png

 

 

SteveDenham
Jade | Level 19

I suspect that the Hessian matrix is nearly singular - the zero count for that date isn't handled well by the log link.  I would be very tempted to remove the 7/16/21 records, if in fact there are nothing but zeroes observed then.  Also, since you have no RANDOM effects in your model, you may wish to do the analysis in PROC GENMOD, where you at least have the option of fitting a zero-inflated model.

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 934 views
  • 0 likes
  • 3 in conversation