Solved: Re: need to verify my random statement for GLIMMIX (repeated measures)

AgReseach7 · Posted 01-22-2013 03:08 PM

Study evaluated fecal egg counts (FEC) in lambs. 4 treatments, FEC analyzed on 2 separate days (repeated measures).

Data is not normal. If it were normal, I'd run PROC Mixed with the following:

MODEL logFEC = TRT day trt*day;

Repeated day/subject = id(trt) type =ARH(1);

LSMEANS TRTuse day trt*day/PDIFF ADJUST=TUKEY;

Then, if trt*day is significant, data would be re-analyzed by day.

Just looking for help to make sure the following statement is correct for GLIMMIX, to account for repeated measure.

DATA VA; SET FEC;

LOGFEC = LOG(FEC+100);

PROC SORT; BY day TRT;

PROC PRINT;

RUN;QUIT;

PROC GLIMMIX;

CLASS TRT day id;

MODEL logFEC = TRT day trt*day;

Random day / subject = id(trt) type =ARH(1) residual;

LSMEANS TRTuse day trt*day/PDIFF ADJUST=TUKEY;

RUN;QUIT;

ALSO: what would be the difference in the above PROC GLIMMIX statement vs. the one below:

PROC GLIMMIX;

CLASS TRT day id;

MODEL logFEC = TRT day trt*day;

Random _Residual_ /subject = ID(trt) type =ARH(1);

Thanks in advance for any help.

SteveDenham · Posted 01-23-2013 10:23 AM

Back to the zeros question, the occurrence doesn't look like there is zero inflation to any great extent.

I ran the analysis several ways: overdispersion R-side and conditional G-side, using a negative binomial distribution As points out for non-normal distributions, you come to radically different conclusions. The overdispersed model resulted in no significances when the Kenward-Rogers correction is applied, while the G-side model and the uncorrected R-side gave significant main effects, but nonsignificant interaction.

Further, if you choose to use a G-side model, you may wish to use Gaussian-hermite quadrature as an optimization method, rather than the default restricted maximum pseudo-likelihood. This may be critical because the variance for the count related distributions (Poisson and negative binomial) influences the estimates, and needs to be included in the optimization and not profiled out as in the pseudo-likelihood methods. Someplace that I can't find right now has some references on this. And this version of the analysis gives a "nearly significant" test for day, but nonsignificant for treatment and treatment by day interaction.

Steve Denham

View solution in original post

SteveDenham · Posted 01-22-2013 03:40 PM

Hopefully some others will chime in on the differences between the two specifications. I would think that the first is the standard repeated measures type of parameterization, with a heterogeneous autoregressive error structure, implying different variances on the days. The second models overdispersion, and I'm coming up blank on how the error structure is modeled over day. Perhaps or or can help out.

But that's not really what I want to talk about. The power of GLIMMIX is the ability to specify error distributions appropriate to the variable at hand. Here it is a count. Almost certainly it is a negative binomial, as I would guess that the variance is not equal to the mean (Poisson assumption). I would approach this as:

PROC GLIMMIX data=yourdata;

CLASS TRT day id;

MODEL FEC = TRT day trt*day/dist=negbin ddfm=kr2;

Random day / subject = id(trt) type =ARH(1) residual;

LSMEANS TRT day /DIFF ADJUST=TUKEY;

LSMEANS trt*day/SLICEDIFF=day adjust=tukey;

RUN;

The following have been entered:

dist=negbin, to model a negative binomial distribution for egg counts. I would also make sure that this variable is the actual counts, rather than the counts/100, unless there are convergence problems.

ddfm=kr2, to apply a Kenward-Rogers adjustment to the denominator degrees of freedom to accommodate for the repeated measures

slicediff=day: to get simple effect comparisons between treatments at each day, if there is a significant trt*day interaction. Doing a separate by day analysis if the interaction is significant is like chopping off one foot before running a marathon. You have the ability to accommodate the actual design with GLIMMIX.

If there are only two days observed for each id(trt), then the following covariance structures will be equivalent: ARH(1), CSH, UN, CHOL, UNR and any of the spatial correlation structures. I would use the unstructured Cholesky parameterization (CHOL) to model unequal variances by day.

As a last comment, do you have a lot of zeros in the data? This might be the case if one or more of the treatments is exceedingly effective at killing worms, resulting in a zero shed rate, or if this study is dealing with natural infections, where some of the lambs have no worms and thus shed no eggs. If there is zero inflation, then we probably need to reconsider everything about the analysis.

Hope this helps, and let us know how it turns out.

Steve Denham

AgReseach7 · Posted 01-22-2013 03:52 PM

Many thanks for the quick response.

Yes... zero's are in the data (see below; sorted by day, trt, FEC).

ID	PENuse	TRTuse	DAY	FEC
9377	R1b	CNTLivm	37	0
9430	R1b	CNTLivm	37	0
9440	R1b	CNTLivm	37	0
9516	R5b	CNTLivm	37	0
9425	R1b	CNTLivm	37	50
9433	R3b	CNTLivm	37	50
9417	R7b	CNTLivm	37	150
9521	R3b	CNTLivm	37	150
9391	R5b	CNTLivm	37	200
9413	R3b	CNTLivm	37	250
9361	R7b	CNTLivm	37	350
9422	R5b	CNTLivm	37	400
9491	R5b	CNTLivm	37	400
9421	R7b	CNTLivm	37	600
9511	R3b	CNTLivm	37	950
9506	R7b	CNTLivm	37	1400
9399	R1a	CNTLno	37	50
9441	R1a	CNTLno	37	50
9515	R3a	CNTLno	37	100
9407	R1a	CNTLno	37	150
9369	R5a	CNTLno	37	200
9431	R5a	CNTLno	37	200
9475	R3a	CNTLno	37	250
9402	R3a	CNTLno	37	350
9455	R1a	CNTLno	37	350
9439	R7a	CNTLno	37	500
9485	R7a	CNTLno	37	600
9519	R7a	CNTLno	37	600
9371	R5a	CNTLno	37	800
9432	R7a	CNTLno	37	800
9456	R5a	CNTLno	37	800
9365	R3a	CNTLno	37	1150
9360	R4b	JUNivm	37	0
9362	R2b	JUNivm	37	0
9414	R2b	JUNivm	37	0
9418	R8b	JUNivm	37	0
9503	R4b	JUNivm	37	0
9518	R6b	JUNivm	37	0
9363	R2b	JUNivm	37	50
9410	R4b	JUNivm	37	50
9454	R2b	JUNivm	37	50
9464	R6b	JUNivm	37	50
9408	R8b	JUNivm	37	100
9376	R6b	JUNivm	37	200
9472	R8b	JUNivm	37	200
9420	R6b	JUNivm	37	250
9483	R4b	JUNivm	37	250
9364	R8b	JUNivm	37	450
9381	R8a	JUNno	37	0
9473	R2a	JUNno	37	50
9397	R2a	JUNno	37	100
9426	R6a	JUNno	37	100
9438	R4a	JUNno	37	100
9462	R2a	JUNno	37	150
9474	R4a	JUNno	37	150
9453	R8a	JUNno	37	200
9457	R2a	JUNno	37	200
9469	R6a	JUNno	37	250
9375	R4a	JUNno	37	400
9388	R8a	JUNno	37	500
9394	R8a	JUNno	37	600
9396	R4a	JUNno	37	850
9479	R6a	JUNno	37	850
9379	R6a	JUNno	37	1150
9430	R1	CNTLivm	47	0
9377	R1	CNTLivm	47	50
9433	R3	CNTLivm	47	50
9516	R5	CNTLivm	47	50
9521	R3	CNTLivm	47	50
9417	R7	CNTLivm	47	100
9425	R1	CNTLivm	47	100
9440	R1	CNTLivm	47	150
9361	R7	CNTLivm	47	200
9491	R5	CNTLivm	47	450
9421	R7	CNTLivm	47	650
9422	R5	CNTLivm	47	700
9391	R5	CNTLivm	47	900
9413	R3	CNTLivm	47	1000
9511	R3	CNTLivm	47	1050
9506	R7	CNTLivm	47	2750
9369	R5	CNTLno	47	50
9475	R3	CNTLno	47	50
9431	R5	CNTLno	47	100
9485	R7	CNTLno	47	100
9515	R3	CNTLno	47	100
9519	R7	CNTLno	47	100
9399	R1	CNTLno	47	150
9439	R7	CNTLno	47	250
9402	R3	CNTLno	47	300
9407	R1	CNTLno	47	350
9441	R1	CNTLno	47	600
9365	R3	CNTLno	47	800
9455	R1	CNTLno	47	800
9432	R7	CNTLno	47	850
9456	R5	CNTLno	47	950
9371	R5	CNTLno	47	1250
9360	R4	JUNivm	47	0
9363	R2	JUNivm	47	0
9408	R8	JUNivm	47	0
9362	R2	JUNivm	47	50
9414	R2	JUNivm	47	50
9418	R8	JUNivm	47	50
9518	R6	JUNivm	47	50
9454	R2	JUNivm	47	100
9503	R4	JUNivm	47	100
9420	R6	JUNivm	47	150
9472	R8	JUNivm	47	250
9483	R4	JUNivm	47	250
9376	R6	JUNivm	47	400
9410	R4	JUNivm	47	450
9364	R8	JUNivm	47	500
9464	R6	JUNivm	47	600
9381	R8	JUNno	47	0
9426	R6	JUNno	47	100
9473	R2	JUNno	47	200
9397	R2	JUNno	47	250
9438	R4	JUNno	47	250
9469	R6	JUNno	47	350
9375	R4	JUNno	47	400
9396	R4	JUNno	47	400
9453	R8	JUNno	47	400
9462	R2	JUNno	47	400
9457	R2	JUNno	47	500
9479	R6	JUNno	47	500
9379	R6	JUNno	47	600
9388	R8	JUNno	47	600
9394	R8	JUNno	47	650
9474	R4	JUNno	47	950

lvm · Posted 01-22-2013 04:27 PM

Your two versions of your original GLIMMIX code fit the same model, with same results. This is because you are using a normal distribution (the default), with an identity link function. In fact, you could fit the same model with MIXED (using the log as the response and a REPEATED statement) because of the normality. These two versions work equivalently when you have all times in all subjects. Also, as Steve indicates, with only two times, ARH(1) is not different from several other structures.

The more complex questions would arise if you took the GLMM approach and model the response (not log response) as a member of the exponential family of distributions (with a log link). Then you would need to distinguish between

Random day / subject = id(trt) type =ARH(1) residual;

and

Random day / subject = id(trt) type =ARH(1) ;

You didn't ask about this, but now you would get different results for some non-normal distributions (e.g., Poisson). Here, the first one (with RESIDUAL option) would model overdispersion directly (R-side analysis) where the second version is modeling the linear predictor conditional on the random effect (G-side analysis).

SteveDenham · Posted 01-23-2013 10:23 AM

Back to the zeros question, the occurrence doesn't look like there is zero inflation to any great extent.

I ran the analysis several ways: overdispersion R-side and conditional G-side, using a negative binomial distribution As points out for non-normal distributions, you come to radically different conclusions. The overdispersed model resulted in no significances when the Kenward-Rogers correction is applied, while the G-side model and the uncorrected R-side gave significant main effects, but nonsignificant interaction.

Further, if you choose to use a G-side model, you may wish to use Gaussian-hermite quadrature as an optimization method, rather than the default restricted maximum pseudo-likelihood. This may be critical because the variance for the count related distributions (Poisson and negative binomial) influences the estimates, and needs to be included in the optimization and not profiled out as in the pseudo-likelihood methods. Someplace that I can't find right now has some references on this. And this version of the analysis gives a "nearly significant" test for day, but nonsignificant for treatment and treatment by day interaction.

Steve Denham

AgReseach7 · Posted 01-28-2013 01:18 PM

I come up with the same conclusion. I run it with ddfm=KR2 and all p-values are near 1.0 (this can't be right).

I run it without KR2 and the P values "seem" to tell the story (I know P-values don't tell you anything about the model, but....)

PROC GLIMMIX; CLASS TRTuse day id;

MODEL FEC = TRTuse day trtuse*day/dist=negbin link=log;

Random day / subject = id(trtuse) type =ARH(1) residual;

LSMEANS TRTuse day trtuse*day/PDIFF ADJUST=TUKEY;

Also: when I use Laplace vs. QUAD method, the model fits better. I don't understand the differences, but I think I should go with Laplace.

Again, thanks for all the help.