BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AgReseach7
Obsidian | Level 7

Study evaluated fecal egg counts (FEC) in lambs. 4 treatments, FEC analyzed on 2 separate days (repeated measures).

Data is not normal. If it were normal, I'd run PROC Mixed with the following:

MODEL logFEC = TRT day trt*day;

Repeated  day/subject = id(trt) type =ARH(1);

LSMEANS TRTuse day trt*day/PDIFF ADJUST=TUKEY;

Then, if trt*day is significant, data would be re-analyzed by day.

Just looking for help to make sure the following statement is correct for GLIMMIX, to account for repeated measure.

DATA VA;  SET FEC;

LOGFEC = LOG(FEC+100);

PROC SORT; BY day TRT;

PROC PRINT;

RUN;QUIT;

PROC GLIMMIX;

CLASS TRT day id;

MODEL logFEC = TRT day trt*day;

Random day / subject = id(trt) type =ARH(1) residual;

LSMEANS TRTuse day trt*day/PDIFF ADJUST=TUKEY;

RUN;QUIT;



ALSO: what would be the difference in the above PROC GLIMMIX statement vs. the one below:

PROC GLIMMIX;

CLASS TRT day id;

MODEL logFEC = TRT day trt*day;

Random _Residual_ /subject = ID(trt) type =ARH(1);


Thanks in advance for any help.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Back to the zeros question, the occurrence doesn't look like there is zero inflation to any great extent.

I ran the analysis several ways: overdispersion R-side and conditional G-side, using a negative binomial distribution  As points out for non-normal distributions, you come to radically different conclusions.  The overdispersed model resulted in no significances when the Kenward-Rogers correction is applied, while the G-side model and the uncorrected R-side gave significant main effects, but nonsignificant interaction.

Further, if you choose to use a G-side model, you may wish to use Gaussian-hermite quadrature as an optimization method, rather than the default restricted maximum pseudo-likelihood.  This may be critical because the variance for the count related distributions (Poisson and negative binomial) influences the estimates, and needs to be included in the optimization and not profiled out as in the pseudo-likelihood methods.  Someplace that I can't find right now has some references on this. And this version of the analysis gives a "nearly significant" test for day, but nonsignificant for treatment and treatment by day interaction.

Steve Denham

View solution in original post

5 REPLIES 5
SteveDenham
Jade | Level 19

Hopefully some others will chime in on the differences between the two specifications.  I would think that the first is the standard repeated measures type of parameterization, with a heterogeneous autoregressive error structure, implying different variances on the days.  The second models overdispersion, and I'm coming up blank on how the error structure is modeled over day.  Perhaps or or can help out.

But that's not really what I want to talk about.  The power of GLIMMIX is the ability to specify error distributions appropriate to the variable at hand.  Here it is a count.  Almost certainly it is a negative binomial, as I would guess that the variance is not equal to the mean (Poisson assumption).  I would approach this as:

PROC GLIMMIX data=yourdata;

CLASS TRT day id;

MODEL FEC = TRT day trt*day/dist=negbin ddfm=kr2;

Random day / subject = id(trt) type =ARH(1) residual;

LSMEANS TRT day /DIFF ADJUST=TUKEY;

LSMEANS trt*day/SLICEDIFF=day adjust=tukey;

RUN;


The following have been entered:

dist=negbin, to model a negative binomial distribution for egg counts.  I would also make sure that this variable is the actual counts, rather than the counts/100, unless there are convergence problems.

ddfm=kr2, to apply a Kenward-Rogers adjustment to the denominator degrees of freedom to accommodate for the repeated measures

slicediff=day: to get simple effect comparisons between treatments at each day, if there is a significant trt*day interaction.  Doing a separate by day analysis if the interaction is significant is like chopping off one foot before running a marathon.  You have the ability to accommodate the actual design with GLIMMIX.

If there are only two days observed for each id(trt), then the following covariance structures will be equivalent: ARH(1), CSH, UN, CHOL, UNR and any of the spatial correlation structures.  I would use the unstructured Cholesky parameterization (CHOL) to model unequal variances by day.

As a last comment, do you have a lot of zeros in the data?  This might be the case if one or more of the treatments is exceedingly effective at killing worms, resulting in a zero shed rate, or if this study is dealing with natural infections, where some of the lambs have no worms and thus shed no eggs.  If there is zero inflation, then we probably need to reconsider everything about the analysis.

Hope this helps, and let us know how it turns out.

Steve Denham

AgReseach7
Obsidian | Level 7

Many thanks for the quick response.

Yes... zero's are in the data (see below; sorted by day, trt, FEC).

IDPENuseTRTuseDAYFEC
9377R1bCNTLivm370
9430R1bCNTLivm370
9440R1bCNTLivm370
9516R5bCNTLivm370
9425R1bCNTLivm3750
9433R3bCNTLivm3750
9417R7bCNTLivm37150
9521R3bCNTLivm37150
9391R5bCNTLivm37200
9413R3bCNTLivm37250
9361R7bCNTLivm37350
9422R5bCNTLivm37400
9491R5bCNTLivm37400
9421R7bCNTLivm37600
9511R3bCNTLivm37950
9506R7bCNTLivm371400
9399R1aCNTLno3750
9441R1aCNTLno3750
9515R3aCNTLno37100
9407R1aCNTLno37150
9369R5aCNTLno37200
9431R5aCNTLno37200
9475R3aCNTLno37250
9402R3aCNTLno37350
9455R1aCNTLno37350
9439R7aCNTLno37500
9485R7aCNTLno37600
9519R7aCNTLno37600
9371R5aCNTLno37800
9432R7aCNTLno37800
9456R5aCNTLno37800
9365R3aCNTLno371150
9360R4bJUNivm370
9362R2bJUNivm370
9414R2bJUNivm370
9418R8bJUNivm370
9503R4bJUNivm370
9518R6bJUNivm370
9363R2bJUNivm3750
9410R4bJUNivm3750
9454R2bJUNivm3750
9464R6bJUNivm3750
9408R8bJUNivm37100
9376R6bJUNivm37200
9472R8bJUNivm37200
9420R6bJUNivm37250
9483R4bJUNivm37250
9364R8bJUNivm37450
9381R8aJUNno370
9473R2aJUNno3750
9397R2aJUNno37100
9426R6aJUNno37100
9438R4aJUNno37100
9462R2aJUNno37150
9474R4aJUNno37150
9453R8aJUNno37200
9457R2aJUNno37200
9469R6aJUNno37250
9375R4aJUNno37400
9388R8aJUNno37500
9394R8aJUNno37600
9396R4aJUNno37850
9479R6aJUNno37850
9379R6aJUNno371150
9430R1CNTLivm470
9377R1CNTLivm4750
9433R3CNTLivm4750
9516R5CNTLivm4750
9521R3CNTLivm4750
9417R7CNTLivm47100
9425R1CNTLivm47100
9440R1CNTLivm47150
9361R7CNTLivm47200
9491R5CNTLivm47450
9421R7CNTLivm47650
9422R5CNTLivm47700
9391R5CNTLivm47900
9413R3CNTLivm471000
9511R3CNTLivm471050
9506R7CNTLivm472750
9369R5CNTLno4750
9475R3CNTLno4750
9431R5CNTLno47100
9485R7CNTLno47100
9515R3CNTLno47100
9519R7CNTLno47100
9399R1CNTLno47150
9439R7CNTLno47250
9402R3CNTLno47300
9407R1CNTLno47350
9441R1CNTLno47600
9365R3CNTLno47800
9455R1CNTLno47800
9432R7CNTLno47850
9456R5CNTLno47950
9371R5CNTLno471250
9360R4JUNivm470
9363R2JUNivm470
9408R8JUNivm470
9362R2JUNivm4750
9414R2JUNivm4750
9418R8JUNivm4750
9518R6JUNivm4750
9454R2JUNivm47100
9503R4JUNivm47100
9420R6JUNivm47150
9472R8JUNivm47250
9483R4JUNivm47250
9376R6JUNivm47400
9410R4JUNivm47450
9364R8JUNivm47500
9464R6JUNivm47600
9381R8JUNno470
9426R6JUNno47100
9473R2JUNno47200
9397R2JUNno47250
9438R4JUNno47250
9469R6JUNno47350
9375R4JUNno47400
9396R4JUNno47400
9453R8JUNno47400
9462R2JUNno47400
9457R2JUNno47500
9479R6JUNno47500
9379R6JUNno47600
9388R8JUNno47600
9394R8JUNno47650
9474R4JUNno47950
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Your two versions of your original GLIMMIX code fit the same model, with same results. This is because you are using a normal distribution (the default), with an identity link function. In fact, you could fit the same model with MIXED (using the log as the response and a REPEATED statement) because of the normality. These two versions work equivalently when you have all times in all subjects. Also, as Steve indicates, with only two times, ARH(1) is not different from several other structures.

The more complex questions would arise if you took the GLMM approach and model the response (not log response) as a member of the exponential family of distributions (with a log link). Then you would need to distinguish between

Random day / subject = id(trt) type =ARH(1) residual;

and

Random day / subject = id(trt) type =ARH(1) ;

You didn't ask about this, but now you would get different results for some non-normal distributions (e.g., Poisson). Here, the first one (with RESIDUAL option) would model overdispersion directly (R-side analysis) where the second version is modeling the linear predictor conditional on the random effect (G-side analysis).


SteveDenham
Jade | Level 19

Back to the zeros question, the occurrence doesn't look like there is zero inflation to any great extent.

I ran the analysis several ways: overdispersion R-side and conditional G-side, using a negative binomial distribution  As points out for non-normal distributions, you come to radically different conclusions.  The overdispersed model resulted in no significances when the Kenward-Rogers correction is applied, while the G-side model and the uncorrected R-side gave significant main effects, but nonsignificant interaction.

Further, if you choose to use a G-side model, you may wish to use Gaussian-hermite quadrature as an optimization method, rather than the default restricted maximum pseudo-likelihood.  This may be critical because the variance for the count related distributions (Poisson and negative binomial) influences the estimates, and needs to be included in the optimization and not profiled out as in the pseudo-likelihood methods.  Someplace that I can't find right now has some references on this. And this version of the analysis gives a "nearly significant" test for day, but nonsignificant for treatment and treatment by day interaction.

Steve Denham

AgReseach7
Obsidian | Level 7

I come up with the same conclusion. I run it with ddfm=KR2 and all p-values are near 1.0 (this can't be right).

I run it without KR2 and the P values "seem" to tell the story (I know P-values don't tell you anything about the model, but....)

PROC GLIMMIX;  CLASS TRTuse day id;

MODEL FEC = TRTuse day trtuse*day/dist=negbin link=log;

Random day / subject = id(trtuse) type =ARH(1) residual;

LSMEANS TRTuse day trtuse*day/PDIFF ADJUST=TUKEY;

Also: when I use Laplace vs. QUAD method, the model fits better. I don't understand the differences, but I think I should go with Laplace.

Again, thanks for all the help.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1719 views
  • 9 likes
  • 3 in conversation