Re: Why does zero variance in raw data have the greatest variance in e...

RosieSAS · Posted 06-03-2022 04:11 PM

Hi All,

I have a question confusing me. The model is generalized linear model with negative binomial distribution. My data are insect counts from 10 dates and 7 locations. The counts are all zeros on 7/16/21. My question is that why the standard error on 7/16/21 is the greatest one, 484.43? As you can see the data scale standard error is still close to 0, which is consistent to the raw data. Another question is that the mean separation is based on the estimates on model scale, then is the mean separation still correct (0 variance on raw data, but greatest variance on model scaled values)?

proc glimmix data=two plots=residualpanel method=quad;
  by year;
  class date field_id;
  model coe_female = date field_id/dist=NB;
  nloptions tech=nrridg;
  lsmeans date/adjust=Tukey lines ilink plots=meanplot(join); 
run;

Thanks a lot!

Rosie

RosieSAS · Posted 06-07-2022 08:28 AM

Is there anyone can give me a hint or insights of the confusing result? Any thought will be appreciated! @Rick_SAS , @StatDave_sas.

Rick_SAS · Posted 06-07-2022 09:00 AM

Your question seems to be "Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM."

In general, I don't think it does. It depends on the data. For your data, the reference category is the last category of the DATE variable. The mean of the data in the reference level is 0.86. In contrast, the mean of the values in the DATE='16JUL2021'd is -13.7. This is far away from the reference mean, which is why the variance of the difference between the mean is so large.

So your result is not because the counts for '16JUL2021'd are a constant (0), but because the constant is far from the average count for the reference category.

RosieSAS · Posted 06-07-2022 09:14 AM

Thanks for your reply, @Rick_SAS. I can understand why the mean value on 7/16/2021 is the smallest, -13.7, but not the standard error. Do you mean the standard error, 484.43, reflects the variance of the difference between the mean on 7/16/2021 and on 09/03/2021? That is totally different from my understanding that the standard errors in LS-mean table are the estimated standard errors of each particular level of the means on model scale, not relative values depending on the reference level. Then why the standard error from ILINK option on raw data scale is not the greatest anymore?

RosieSAS · Posted 06-07-2022 09:29 AM

When I checked the estimated parameters, the stdr is 484.43 of 7/16/2021. The stdr of differences of DATA means when comparing 7/16/2021 to other dates, they are always 484.43. Because of the huge stdr, it is difficult to detect any significant difference between the counts on 7/16/2021 to other dates. Do you know why? Thanks!

SteveDenham · Posted 06-16-2022 10:57 AM

I suspect that the Hessian matrix is nearly singular - the zero count for that date isn't handled well by the log link. I would be very tempted to remove the 7/16/21 records, if in fact there are nothing but zeroes observed then. Also, since you have no RANDOM effects in your model, you may wish to do the analysis in PROC GENMOD, where you at least have the option of fitting a zero-inflated model.

SteveDenham

Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

Re: Why does zero variance in raw data have the greatest variance in estimated model scale in GLMM

SAS Innovate 2025: Call for Content