comparing models fit with GLIMMIX using R-side Z matrix

staudham · Posted 04-08-2019 03:12 PM

I am using PROC GLIMMIX to fit a Poisson model to repeated measures count data. The data are fruit counts from trees in two sites over 9 years. The data clearly do not show an AR(1) structure, and one of the sites shows a roughly bi-annual pattern of correlation. When I include a CS structure, the Generalized Chi-Sq/DF is very high (>10), but if I either include a GROUP in the RANDOM statement, or use TYPE=UN, the Generalized Chi -q/DF is 1.00. And this latter result is repeated regardless of what fixed effects I add or eliminate.

First of all, is this result an artifact of the way these models are fit? That is, is it over-parameterizing in some way as to give me a 'perfect fit'? And second, if this fit statistic is in fact valid, how do I compare two competing models? There are no other fit statistics generated. (Output attached). Thanks for your time.

*Simplified models with just a few effects.;

PROC GLIMMIX DATA=sasds.cast_1019;
	CLASS local arv ano ;
	MODEL prod =   LOCAL|ano|dap  /DIST=poisson ddfm=kenward ;
	RANDOM RESIDUAL/SUB=arv(LOCAL) TYPE=CS;
	NLOPTIONS tech=nrridg;
RUN;  *this gives very poor fit:  Gener. Chi-Square / DF 58.61 ;

PROC GLIMMIX DATA=sasds.cast_1019;
	CLASS local arv ano;
	MODEL prod =   LOCAL|ano|dap  /DIST=poisson ddfm=kenward ;
	RANDOM RESIDUAL/SUB=arv(LOCAL) GROUP=local TYPE=CS;
	NLOPTIONS tech=nrridg;
RUN;		* Gener. Chi-Square / DF 1.00 ;

PROC GLIMMIX DATA=sasds.cast_1019;
	CLASS local arv ano;
	MODEL prod =   LOCAL|ano|dap  /DIST=poisson ddfm=kenward ;
	RANDOM RESIDUAL/SUB=arv(LOCAL)  TYPE=UN;
	NLOPTIONS tech=nrridg;
RUN;	* Gener. Chi-Square / DF 1.00 ;

Rick_SAS · Posted 04-16-2019 09:16 AM

No suggestions yet. I'm hoping that @sld or @StatsMan might have an idea about this.

sld · Posted 04-16-2019 12:32 PM

Before I dove too deeply into modeling the covariance structure of the repeated measures, I first would look into whether the Poisson distribution is the best choice. From your output, I suspect not and that something like the negative binomial or a generalized Poisson might suit the data better. I'm not seeing a lot of evidence of heterogeneity of variance in the current results either, so a relatively simple covariance structure might work well enough, once other issues are resolved. This paper by Walt Stroup dates to 2011, and I know that Walt is continuing to refine his understanding of GLMMs (and recommendations for use) but I think the paper might still be quite helpful.

There is a distinction between a GLMM and a GEE-type model that focuses on whether or how you model "residual" (the R-side stuff). For an example of the latter, see Example 38.12 Fitting a Marginal (GEE-Type) Model. I usually take the GLMM approach because I think it is more "natural" (Walt Stroup addresses this concept in his writings), but a GEE-type model could do the job as well.

What is DAP? From the output, it looks like a continuous covariate measured on each tree (ARV). Are you comfortable with assuming a linear relationship between DAP and log(PROD)? A misspecified mean model could contribute to overdispersion. I assume that LOCAL is site, and ANO is year.

What is the range of values for PROD? Are values equal to or close to zero, or large? I'd ponder zero-inflation if counts are small, or even a normal or lognormal distribution (probably with heterogeneous variances) if counts are large.

I would use method=laplace or quad, unless results were pathological (which sometimes happens). These methods allow information criteria (e.g., AIC) which you could use to compare models. The default pseudo-likelihood method does not.

I would start with

model prod = local|ano|dap / dist=poisson;
random intercept / subject=arv(local);

and see whether the Generalized chisq/df indicated overdispersion. (Pretty sure it will.)

If so, then I'd try

model prod = local|ano|dap / dist=poisson;
random intercept ano / subject=arv(local);

And then I'd try either a negative binomial distribution or a generalized Poisson distribution (see Example 38.14 Generalized Poisson Mixed Model for Overdispersed Count Data). As far as I know, it is not possible to model a covariance structure among repeated measures when you move to a two-parameter distribution (e.g., negative binomial) from a one-parameter distribution (e.g., Poisson); SAS tech support might be able to weigh in on that.

And then I'd look to see whether the fruit production story changed with the model. It's a comfort when results point you in the same direction, regardless 🙂

I hope this helps.

staudham · Posted 04-18-2019 04:34 PM

You are right that the data are extremely overdispersed. However, I have not been able to get Negative Binomial models to converge with my data. I have not investigated the generalized Poisson - but that is a great suggestion.

As for the R-side random effects, this was a suggestion from the SAS technical support, as I had trouble with model convergence with G-side effects.

Yes, DAP is a continuous variable for tree diameter. Although I have more variables available, our model is predicting fruit production per tree as a function of year, DAP, and site. (I did not list the other possible covariates to simplify the problem.) There is some evidence in previous literature of a more quadratic pattern of diameter versus numbers of fruits (due to tree senescence); however, my data do not support that. There is a weak linear relationship evident in the data. Since there are years where some trees produce no fruit at all, then a log-normal model is not appropriate. These are Brazil nut trees and the fruit production can vary wildly from year to year, with some individuals producing 900+ fruits and some producing 0. 'Normal' production is in the 100-200 range. We are trying to better explain variation among trees and among years.

I had avoided the Laplace and quad methods as I was interested in using Kenward-Rogers DDFM. That said, I have followed your suggestion and got some output. For the first set of code, I get a Pearson Chi-sq/DR of 41. absolutely terrible. For the second, the Pearson Chi-sq/DF many of the effects are not estimable.

PROC GLIMMIX DATA=sasds.cast_1019 method=quad ;
	CLASS local arv ano ;
	MODEL prod =   local|dap|ano     /DIST=poisson  ; 
	random intercept ano / subject=arv(local);
RUN;

Any suggestions as to why that might happen? here is the output:

Fit Statistics for Conditional Distribution 
-2 log L(prod | r. effects) 12826.77 
Pearson Chi-Square 353.17 
Pearson Chi-Square / DF 0.15 



Covariance Parameter Estimates 
Cov Parm Subject Estimate Standard
Error 
Intercept Arv(local) 2.4705 0.2474 
ano Arv(local) 1.1158 0.04375 



Type III Tests of Fixed Effects 
Effect Num DF Den DF F Value Pr > F 
local 1 256 2.16 0.1428 
dap 1 0 3.42 . 
dap*local 1 0 1.85 . 
ano 8 2012 7.38 <.0001 
local*ano 8 2012 1.46 0.1654 
dap*ano 8 0 3.12 . 
dap*local*ano 8 0 0.77 .

sld · Posted 04-18-2019 05:32 PM

Hmm.

I don't see that your model is overspecified (resulting in 0 ddf for some terms), although there could well be something I don't see or some data structure that I don't know that. So at the moment I can only offer some thoughts and questions.

Does each tree have the same DAP for all years, or does DAP change with year?

Have you tried centering or standardizing DAP?

What proportion of the prod values are equal to zero? Might zero values be predicted by your explanatory variables? Would it be worth ignoring the mixed model structure and exploring a mixture model (e.g., zero-inflated or hurdle) using the FMM procedure?

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_fmm_gettingstarted02.htm&docsetVe...

staudham · Posted 04-30-2019 10:41 AM

Thanks so much for the suggestions! DAP does change by year, so that was not the issue. I found using the FMM procedure that a Generalized Poisson model fits the data fairly well (Chi Sq/DF = 0.75), and I can use GLIMMIX to fit this model with random effects. However, because this model must be fit with the LaPlace method, I cannot specify a Kenward-Rogers type of DDFM. This makes my degrees of freedom - in my opinion - way too high. However, I think that it is better to interpret the significance of the effects conservatively than to have a very bad fit to the model.

sld · Posted 04-30-2019 12:32 PM

It sounds like you might have found a "good enough" model 🙂

If you have a decent idea of what the denominator degrees of freedom "should" be, then you can specify them explicitly with the DDF option on the MODEL statement.

comparing models fit with GLIMMIX using R-side Z matrix

Re: comparing models fit with GLIMMIX using R-side Z matrix

Re: comparing models fit with GLIMMIX using R-side Z matrix

Re: comparing models fit with GLIMMIX using R-side Z matrix

Re: comparing models fit with GLIMMIX using R-side Z matrix

Re: comparing models fit with GLIMMIX using R-side Z matrix

Re: comparing models fit with GLIMMIX using R-side Z matrix