- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Study evaluated fecal egg counts (FEC) in lambs. 4 treatments, FEC analyzed on 2 separate days (repeated measures).
Data is not normal. If it were normal, I'd run PROC Mixed with the following:
MODEL logFEC = TRT day trt*day;
Repeated day/subject = id(trt) type =ARH(1);
LSMEANS TRTuse day trt*day/PDIFF ADJUST=TUKEY;
Then, if trt*day is significant, data would be re-analyzed by day.
Just looking for help to make sure the following statement is correct for GLIMMIX, to account for repeated measure.
DATA VA; SET FEC;
LOGFEC = LOG(FEC+100);
PROC SORT; BY day TRT;
PROC PRINT;
RUN;QUIT;
PROC GLIMMIX;
CLASS TRT day id;
MODEL logFEC = TRT day trt*day;
Random day / subject = id(trt) type =ARH(1) residual;
LSMEANS TRTuse day trt*day/PDIFF ADJUST=TUKEY;
RUN;QUIT;
ALSO: what would be the difference in the above PROC GLIMMIX statement vs. the one below:
PROC GLIMMIX;
CLASS TRT day id;
MODEL logFEC = TRT day trt*day;
Random _Residual_ /subject = ID(trt) type =ARH(1);
Thanks in advance for any help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Back to the zeros question, the occurrence doesn't look like there is zero inflation to any great extent.
I ran the analysis several ways: overdispersion R-side and conditional G-side, using a negative binomial distribution As points out for non-normal distributions, you come to radically different conclusions. The overdispersed model resulted in no significances when the Kenward-Rogers correction is applied, while the G-side model and the uncorrected R-side gave significant main effects, but nonsignificant interaction.
Further, if you choose to use a G-side model, you may wish to use Gaussian-hermite quadrature as an optimization method, rather than the default restricted maximum pseudo-likelihood. This may be critical because the variance for the count related distributions (Poisson and negative binomial) influences the estimates, and needs to be included in the optimization and not profiled out as in the pseudo-likelihood methods. Someplace that I can't find right now has some references on this. And this version of the analysis gives a "nearly significant" test for day, but nonsignificant for treatment and treatment by day interaction.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hopefully some others will chime in on the differences between the two specifications. I would think that the first is the standard repeated measures type of parameterization, with a heterogeneous autoregressive error structure, implying different variances on the days. The second models overdispersion, and I'm coming up blank on how the error structure is modeled over day. Perhaps or or can help out.
But that's not really what I want to talk about. The power of GLIMMIX is the ability to specify error distributions appropriate to the variable at hand. Here it is a count. Almost certainly it is a negative binomial, as I would guess that the variance is not equal to the mean (Poisson assumption). I would approach this as:
PROC GLIMMIX data=yourdata;
CLASS TRT day id;
MODEL FEC = TRT day trt*day/dist=negbin ddfm=kr2;
Random day / subject = id(trt) type =ARH(1) residual;
LSMEANS TRT day /DIFF ADJUST=TUKEY;
LSMEANS trt*day/SLICEDIFF=day adjust=tukey;
RUN;
The following have been entered:
dist=negbin, to model a negative binomial distribution for egg counts. I would also make sure that this variable is the actual counts, rather than the counts/100, unless there are convergence problems.
ddfm=kr2, to apply a Kenward-Rogers adjustment to the denominator degrees of freedom to accommodate for the repeated measures
slicediff=day: to get simple effect comparisons between treatments at each day, if there is a significant trt*day interaction. Doing a separate by day analysis if the interaction is significant is like chopping off one foot before running a marathon. You have the ability to accommodate the actual design with GLIMMIX.
If there are only two days observed for each id(trt), then the following covariance structures will be equivalent: ARH(1), CSH, UN, CHOL, UNR and any of the spatial correlation structures. I would use the unstructured Cholesky parameterization (CHOL) to model unequal variances by day.
As a last comment, do you have a lot of zeros in the data? This might be the case if one or more of the treatments is exceedingly effective at killing worms, resulting in a zero shed rate, or if this study is dealing with natural infections, where some of the lambs have no worms and thus shed no eggs. If there is zero inflation, then we probably need to reconsider everything about the analysis.
Hope this helps, and let us know how it turns out.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for the quick response.
Yes... zero's are in the data (see below; sorted by day, trt, FEC).
ID | PENuse | TRTuse | DAY | FEC |
9377 | R1b | CNTLivm | 37 | 0 |
9430 | R1b | CNTLivm | 37 | 0 |
9440 | R1b | CNTLivm | 37 | 0 |
9516 | R5b | CNTLivm | 37 | 0 |
9425 | R1b | CNTLivm | 37 | 50 |
9433 | R3b | CNTLivm | 37 | 50 |
9417 | R7b | CNTLivm | 37 | 150 |
9521 | R3b | CNTLivm | 37 | 150 |
9391 | R5b | CNTLivm | 37 | 200 |
9413 | R3b | CNTLivm | 37 | 250 |
9361 | R7b | CNTLivm | 37 | 350 |
9422 | R5b | CNTLivm | 37 | 400 |
9491 | R5b | CNTLivm | 37 | 400 |
9421 | R7b | CNTLivm | 37 | 600 |
9511 | R3b | CNTLivm | 37 | 950 |
9506 | R7b | CNTLivm | 37 | 1400 |
9399 | R1a | CNTLno | 37 | 50 |
9441 | R1a | CNTLno | 37 | 50 |
9515 | R3a | CNTLno | 37 | 100 |
9407 | R1a | CNTLno | 37 | 150 |
9369 | R5a | CNTLno | 37 | 200 |
9431 | R5a | CNTLno | 37 | 200 |
9475 | R3a | CNTLno | 37 | 250 |
9402 | R3a | CNTLno | 37 | 350 |
9455 | R1a | CNTLno | 37 | 350 |
9439 | R7a | CNTLno | 37 | 500 |
9485 | R7a | CNTLno | 37 | 600 |
9519 | R7a | CNTLno | 37 | 600 |
9371 | R5a | CNTLno | 37 | 800 |
9432 | R7a | CNTLno | 37 | 800 |
9456 | R5a | CNTLno | 37 | 800 |
9365 | R3a | CNTLno | 37 | 1150 |
9360 | R4b | JUNivm | 37 | 0 |
9362 | R2b | JUNivm | 37 | 0 |
9414 | R2b | JUNivm | 37 | 0 |
9418 | R8b | JUNivm | 37 | 0 |
9503 | R4b | JUNivm | 37 | 0 |
9518 | R6b | JUNivm | 37 | 0 |
9363 | R2b | JUNivm | 37 | 50 |
9410 | R4b | JUNivm | 37 | 50 |
9454 | R2b | JUNivm | 37 | 50 |
9464 | R6b | JUNivm | 37 | 50 |
9408 | R8b | JUNivm | 37 | 100 |
9376 | R6b | JUNivm | 37 | 200 |
9472 | R8b | JUNivm | 37 | 200 |
9420 | R6b | JUNivm | 37 | 250 |
9483 | R4b | JUNivm | 37 | 250 |
9364 | R8b | JUNivm | 37 | 450 |
9381 | R8a | JUNno | 37 | 0 |
9473 | R2a | JUNno | 37 | 50 |
9397 | R2a | JUNno | 37 | 100 |
9426 | R6a | JUNno | 37 | 100 |
9438 | R4a | JUNno | 37 | 100 |
9462 | R2a | JUNno | 37 | 150 |
9474 | R4a | JUNno | 37 | 150 |
9453 | R8a | JUNno | 37 | 200 |
9457 | R2a | JUNno | 37 | 200 |
9469 | R6a | JUNno | 37 | 250 |
9375 | R4a | JUNno | 37 | 400 |
9388 | R8a | JUNno | 37 | 500 |
9394 | R8a | JUNno | 37 | 600 |
9396 | R4a | JUNno | 37 | 850 |
9479 | R6a | JUNno | 37 | 850 |
9379 | R6a | JUNno | 37 | 1150 |
9430 | R1 | CNTLivm | 47 | 0 |
9377 | R1 | CNTLivm | 47 | 50 |
9433 | R3 | CNTLivm | 47 | 50 |
9516 | R5 | CNTLivm | 47 | 50 |
9521 | R3 | CNTLivm | 47 | 50 |
9417 | R7 | CNTLivm | 47 | 100 |
9425 | R1 | CNTLivm | 47 | 100 |
9440 | R1 | CNTLivm | 47 | 150 |
9361 | R7 | CNTLivm | 47 | 200 |
9491 | R5 | CNTLivm | 47 | 450 |
9421 | R7 | CNTLivm | 47 | 650 |
9422 | R5 | CNTLivm | 47 | 700 |
9391 | R5 | CNTLivm | 47 | 900 |
9413 | R3 | CNTLivm | 47 | 1000 |
9511 | R3 | CNTLivm | 47 | 1050 |
9506 | R7 | CNTLivm | 47 | 2750 |
9369 | R5 | CNTLno | 47 | 50 |
9475 | R3 | CNTLno | 47 | 50 |
9431 | R5 | CNTLno | 47 | 100 |
9485 | R7 | CNTLno | 47 | 100 |
9515 | R3 | CNTLno | 47 | 100 |
9519 | R7 | CNTLno | 47 | 100 |
9399 | R1 | CNTLno | 47 | 150 |
9439 | R7 | CNTLno | 47 | 250 |
9402 | R3 | CNTLno | 47 | 300 |
9407 | R1 | CNTLno | 47 | 350 |
9441 | R1 | CNTLno | 47 | 600 |
9365 | R3 | CNTLno | 47 | 800 |
9455 | R1 | CNTLno | 47 | 800 |
9432 | R7 | CNTLno | 47 | 850 |
9456 | R5 | CNTLno | 47 | 950 |
9371 | R5 | CNTLno | 47 | 1250 |
9360 | R4 | JUNivm | 47 | 0 |
9363 | R2 | JUNivm | 47 | 0 |
9408 | R8 | JUNivm | 47 | 0 |
9362 | R2 | JUNivm | 47 | 50 |
9414 | R2 | JUNivm | 47 | 50 |
9418 | R8 | JUNivm | 47 | 50 |
9518 | R6 | JUNivm | 47 | 50 |
9454 | R2 | JUNivm | 47 | 100 |
9503 | R4 | JUNivm | 47 | 100 |
9420 | R6 | JUNivm | 47 | 150 |
9472 | R8 | JUNivm | 47 | 250 |
9483 | R4 | JUNivm | 47 | 250 |
9376 | R6 | JUNivm | 47 | 400 |
9410 | R4 | JUNivm | 47 | 450 |
9364 | R8 | JUNivm | 47 | 500 |
9464 | R6 | JUNivm | 47 | 600 |
9381 | R8 | JUNno | 47 | 0 |
9426 | R6 | JUNno | 47 | 100 |
9473 | R2 | JUNno | 47 | 200 |
9397 | R2 | JUNno | 47 | 250 |
9438 | R4 | JUNno | 47 | 250 |
9469 | R6 | JUNno | 47 | 350 |
9375 | R4 | JUNno | 47 | 400 |
9396 | R4 | JUNno | 47 | 400 |
9453 | R8 | JUNno | 47 | 400 |
9462 | R2 | JUNno | 47 | 400 |
9457 | R2 | JUNno | 47 | 500 |
9479 | R6 | JUNno | 47 | 500 |
9379 | R6 | JUNno | 47 | 600 |
9388 | R8 | JUNno | 47 | 600 |
9394 | R8 | JUNno | 47 | 650 |
9474 | R4 | JUNno | 47 | 950 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your two versions of your original GLIMMIX code fit the same model, with same results. This is because you are using a normal distribution (the default), with an identity link function. In fact, you could fit the same model with MIXED (using the log as the response and a REPEATED statement) because of the normality. These two versions work equivalently when you have all times in all subjects. Also, as Steve indicates, with only two times, ARH(1) is not different from several other structures.
The more complex questions would arise if you took the GLMM approach and model the response (not log response) as a member of the exponential family of distributions (with a log link). Then you would need to distinguish between
Random day / subject = id(trt) type =ARH(1) residual;
and
Random day / subject = id(trt) type =ARH(1) ;
You didn't ask about this, but now you would get different results for some non-normal distributions (e.g., Poisson). Here, the first one (with RESIDUAL option) would model overdispersion directly (R-side analysis) where the second version is modeling the linear predictor conditional on the random effect (G-side analysis).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Back to the zeros question, the occurrence doesn't look like there is zero inflation to any great extent.
I ran the analysis several ways: overdispersion R-side and conditional G-side, using a negative binomial distribution As points out for non-normal distributions, you come to radically different conclusions. The overdispersed model resulted in no significances when the Kenward-Rogers correction is applied, while the G-side model and the uncorrected R-side gave significant main effects, but nonsignificant interaction.
Further, if you choose to use a G-side model, you may wish to use Gaussian-hermite quadrature as an optimization method, rather than the default restricted maximum pseudo-likelihood. This may be critical because the variance for the count related distributions (Poisson and negative binomial) influences the estimates, and needs to be included in the optimization and not profiled out as in the pseudo-likelihood methods. Someplace that I can't find right now has some references on this. And this version of the analysis gives a "nearly significant" test for day, but nonsignificant for treatment and treatment by day interaction.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I come up with the same conclusion. I run it with ddfm=KR2 and all p-values are near 1.0 (this can't be right).
I run it without KR2 and the P values "seem" to tell the story (I know P-values don't tell you anything about the model, but....)
PROC GLIMMIX; CLASS TRTuse day id;
MODEL FEC = TRTuse day trtuse*day/dist=negbin link=log;
Random day / subject = id(trtuse) type =ARH(1) residual;
LSMEANS TRTuse day trtuse*day/PDIFF ADJUST=TUKEY;
Also: when I use Laplace vs. QUAD method, the model fits better. I don't understand the differences, but I think I should go with Laplace.
Again, thanks for all the help.