BookmarkSubscribeRSS Feed
jl82
Calcite | Level 5

I am a relatively novice SAS user (from a background NOT involving experimental research), but have been working over the past several months to fit models involving 5 waves of intervention data for 1000+ individuals, where the intervention is a 2x2 design. I have been using SAS PROC GLIMMIX because of the repeated measures combined with count outcomes. I'm currently working with a negative binomial distribution and the ARMA(1,1) covariance structure (after trying to compare several options). From reading, I had gathered that ddfm=KR is probably the way to go.

I've been having no particular issues up to this point, but we decided to add another covariate to our models, and suddenly I am getting very strange (small) degrees of freedom estimates. This covariate is just a dummy variable. I can get larger (more correct?) degrees of freedom estimates by either (1) maintaining a higher-order interaction term in the model that is not significant (we have been dropping these generally to make the models easier to interpret, since the lower-order effects are of more interest to us and our audience) or (2) dropping that higher-order term AND the new covariate.

Here is a sample of what my syntax currently looks like--

PROC GLIMMIX data=XXX INITGLM MAXLMMUPDATE=500 PCONV=1e-7;

     CLASS time subject intervention1 intervention2;

     MODEL countoutcome=intervention1|intervention2|time covdummy1|time cov2 covdummy2 covdummy3 cov4 / SOLUTION DIST=negbin DDFM=KR;

     RANDOM time / RESIDUAL SUBJECT=subject TYPE=ARMA(1,1);

RUN;

covdummy1*time is n.s. (p=.14) and we'd prefer to drop it (since the two main effects, which are of interest, are n.s. with the interaction in the model). covdummy3 is the new covariate we added.

* Run just like presented above, the denominator dfs for the fixed factors are all > 900 and for the time effects all > 2200. This is also true if I *both* drop covdummy3 and remove the covdummy1*time interaction.

* If I drop covdummy1*time (but keep the covariate in the model), all the fixed effect dfs = 1 and the time effect dfs are ~206.


I'm assuming the big change in df indicates some problem with my model, and it's making me nervous. I can just ignore covdummy3, but are there other things I should check out given these problems?

Thanks!

7 REPLIES 7
SteveDenham
Jade | Level 19

Something I am starting to pick up from the R community on these types of models is that the F stats are sometimes (and this looks like one of the cases) a mess.  The suggestion there is to use a chisquared test.  To implement this in GLIMMIX, change from ddfm=kr to ddfm=none, or by specifying CHISQ in the model statement.

One thing I would warn about is using an R side covariance structure with the negative binomial distribution.  Because the mean and deviance of this distribution are functionally related, the pseudo-likelihood method required for R side specification will lead to biased estimates and tests.  Consider dropping RESIDUAL from the random statement, and adding method=laplace (or method=quad) to the PROC GLIMMIX statement.  As a reference, see Stroup's Generalized Linear Mixed Models.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You get ddf=1 with the KR method when things blow up with the variance-covariance parameters. When the complex calculations for the KR method give a nonsensical value, including a value less than 1, MIXED and GLIMMIX both give 1 for the denominator df. Ill-conditioned matrices are common culprits. I bet you have one or more warnings in your Log. I am thinking your model is overparameterized in terms of variance-covariance parameters. I know that arma(1,1) is especially difficult to fit. And as indicated by Steve, R-side covariance terms with a negative binomial can lead to all kinds of challenges.

jl82
Calcite | Level 5

Thanks for your response. There are no warnings in the log; that's the first thing I checked, since it seemed pretty clear to me that something was wrong. I had trouble picking between the structures since the fit statistics cannot be compared in the usual way, but ARMA(1,1) did seem to have better pseudo fit stats than AR, ARH, or ANTE, and it had been used by other researchers in my area as "best" in the past. A quick check using AR(1) instead shows the same problem.

Switching away from R side seems like it might be a good option. It is just so puzzling to me how this one covariate (which has  low correlation with other covariates and about a 50/50 breakdown between the two categories) causes problems, and that they seem to emerge with a simplified model (fewer interaction terms).

SteveDenham
Jade | Level 19

Just on a guess as to what happens with that covariate--one or more of the dummy values is such that there is near quasi-separation (I would guess a boatload of identical values, probably zero, for one level at one time period (or more))..

With integration-type methods, you have to switch to G side modeling of time.  It's really the only way to get valid IC values as well, and all of the types are available.

Steve Denham

jl82
Calcite | Level 5

Thank you again -- very helpful.

I am liking the models with the G side modeling of time; things seem a little more stable with switches and some of the marginally significant things that were difficult to interpret disappear.

However, I am having issues with some of my models where the F values (or chi-square values if I use the CHISQ option in the MODEL statement) are "Infty" whenever I use more complex covariance structures (basically, anything other than AR(1)). I can't figure out any tweaks that make this problem go away.

Also, moving to G side random effects makes me wonder about adding a random intercept. Time is categorical in my model. I find that the fit statistics are identical or nearly identical for models with or without the RANDOM intercept line below -- does it have meaning in this context? It seems to sometimes (but not always) fix the Infty problem.

RANDOM intercept / SUBJECT=subject;

RANDOM time / SUBJECT=subject TYPE=AR(1);

SteveDenham
Jade | Level 19

Well, the infinity problem comes from what mentioned regarding model complexity--too many parameters being estimated leads to zeroes being put in for standard errors of the lsmeans.  You may have to collapse some of your time categories to get around this, given that obtaining more data is probably out of the question.

As far as the AR vs. AR+RI--my preference there stems from fitting Gaussian data (see, for example, Littell et al., Statist. Med. 2000, 19:1793-1819).  I honestly don't know if it will lead to better or worse fits for other distributions that are fit as G side models.  However, it is an additional parameter to estimate, and it may be causing the infinity problem.

Steve Denham

jl82
Calcite | Level 5

Thanks, Steve. I have been reading this forum a lot and actually ordered the Stroup book last week. I will check it out.

Just as a quick check, would switching away from the R side specification (which I have been interested in to avoid the pseudo likelihood and get fit statistics I can compare across models) be appropriate even if time is a meaningful, ordered factor for us? And do the same covariance structures still apply?

I have background as a developmental psychologist using primarily latent growth modeling, so I'm finding adjusting to PROC GLIMMIX hard.

Thanks again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1948 views
  • 6 likes
  • 3 in conversation