Solved: Re: Clarification needed for glimmix covariance parameters test

marcel · Posted 09-10-2020 02:51 PM

Hello,
I would like to ask for a clarification. If anyone could help me, I will be very much obliged.

I have an experiment with two treatments, the response was measured at 7 days intervals.

The measures were taken on subjects nested as follows: daughter from a mother from a family.

In SAS notation I would write this nested arrangement as follows: daughter(mother family).
By following "Stroup et al. (2018). SAS for Mixed Models: Introduction and Basic Applications", (chapter 8), I was able to visualize the covariance structure that may be best for my data:

However, when I run the model with AR(1) by using Glimmix:

proc glimmix data=data_long;
class treat time daughter mother family;
model resp=time|treat longevity/ddfm=kr2;
random intercept / subject=daughter(mother family);
random time / subject=daughter(mother family) type=ar(1) residual;
covtest 'between subject variance = 0?' zeroG;
run;

The result of the test of covariance parameters shows a Pr>ChiSq = 1.0.
I understand this means that the between-subject effect can dropped from the model. When I drop the first "random" statement:
(i.e., random intercept / subject=daughter(mother family), The result of the test of covariance is the same (Pr>ChiSq = 1.0).

However, when I run:

proc mixed data=data_long covtest;
class treat time daughter mother family;
model resp=time|treat longevity/ddfm=kr;
repeated time / subject=daughter(mother family) type=ar(1);
run;

The Covariance Parameter Estimates test AR(1) daughter(mother*family) has a PrZ < 0.0001, which means it is significant and it has to stay in the model.

I am understanding the analysis all wrong?

Please, correct me if I am wrong, but I understand that in the repeated measures analysis I need to specify daughter(mother family) ie, the subjects on which the repeated measures were taken.
But proc glimmix tells me that the between-subject effect can be dropped?
Is there any other way in GLIMMIX to specify the subjects on which the repeated measures were taken?
Thank you for your help.
marcel

SteveDenham · Posted 09-11-2020 09:52 AM

There really isn't another way to specify what the subject is. So let's look at the graph and the output. First off, I see that the AR(1) and the AR(1)+RE covariance profiles are identical (at least to the naked eye). I suspect that when you run the AR(1)+RE model, the covariance parameter estimate for the random effect is either 0 or missing, and there is no standard error for it. Additionally, I would suspect that there is a message in the output to the effect that the G matrix is not positive definite. Taken together these imply that there is no additional variabilty accounted for with the random effect of daughter within mother*family. That is what the zeroG likelihood ratio test says as well. So, the better (and also more parsimonious) parameterization is just AR(1).. So on to the comparison between MIXED and GLIMMIX. The covtest option in your MIXED code is a Wald type chi-square for the repeated (=R-side) parameters associated with AR(1). You don't have a test for this in the GLIMMIX code. If you drop the between subject random effect, and still test with zeroG in the COVTEST statement, you certainly ought to get Pr>ChiSq = 1.0, as there are already no G side parameters in the model. The equivalent COVTEST in GLIMMIX to the MIXED test would look like:

covtest glm /wald;

Now, since the between subjects variance component is zero (likely on the boundary or less than zero if you fit by ML rather than REML), the residual variance captures all the variability in your data. If you are truly concerned about reporting a variance component for daughter, then change the GLIMMIX code to:

proc glimmix data=data_long method=MSPL nobound;
class treat time daughter mother family;
model resp=time|treat longevity/ddfm=kr2;
random intercept / subject=daughter(mother family);
random time / subject=daughter(mother family) type=ar(1) residual;
covtest 'between subject variance = 0?' zeroG/cl(type=plr); /* or type=wald */
run;

This will provide an unbounded maximum likelihood estimate of the variance component (plus add in a column that gives the ratio of the variance components to the residual variance).

I hope this helps some.

SteveDenham

View solution in original post

SteveDenham · Posted 09-11-2020 09:52 AM

There really isn't another way to specify what the subject is. So let's look at the graph and the output. First off, I see that the AR(1) and the AR(1)+RE covariance profiles are identical (at least to the naked eye). I suspect that when you run the AR(1)+RE model, the covariance parameter estimate for the random effect is either 0 or missing, and there is no standard error for it. Additionally, I would suspect that there is a message in the output to the effect that the G matrix is not positive definite. Taken together these imply that there is no additional variabilty accounted for with the random effect of daughter within mother*family. That is what the zeroG likelihood ratio test says as well. So, the better (and also more parsimonious) parameterization is just AR(1).. So on to the comparison between MIXED and GLIMMIX. The covtest option in your MIXED code is a Wald type chi-square for the repeated (=R-side) parameters associated with AR(1). You don't have a test for this in the GLIMMIX code. If you drop the between subject random effect, and still test with zeroG in the COVTEST statement, you certainly ought to get Pr>ChiSq = 1.0, as there are already no G side parameters in the model. The equivalent COVTEST in GLIMMIX to the MIXED test would look like:

covtest glm /wald;

Now, since the between subjects variance component is zero (likely on the boundary or less than zero if you fit by ML rather than REML), the residual variance captures all the variability in your data. If you are truly concerned about reporting a variance component for daughter, then change the GLIMMIX code to:

proc glimmix data=data_long method=MSPL nobound;
class treat time daughter mother family;
model resp=time|treat longevity/ddfm=kr2;
random intercept / subject=daughter(mother family);
random time / subject=daughter(mother family) type=ar(1) residual;
covtest 'between subject variance = 0?' zeroG/cl(type=plr); /* or type=wald */
run;

This will provide an unbounded maximum likelihood estimate of the variance component (plus add in a column that gives the ratio of the variance components to the residual variance).

I hope this helps some.

SteveDenham

marcel · Posted 09-26-2020 05:31 PM

Dear Steve,
I am very sorry for the late answer to your comment, but I was "out of commission" for a few days. Some covid scare and things like that.
Thank you for your very educational, tactful, and wise answer. You are really very good at explaining concepts. I say this because I have seen your answers to many other questions in this forum.
I want to clarify that I only want to know what treatment increases fecundity when interacting with time. A simple ls-means table of Treat*Time would be enough for me. I do not want to predict anything.
Given the above, I have two questions:

1) Is it better to always go with a "simpler" covariance structure, as long as the results are similar to those obtained when running models with more complex covariance structures?

2) are there considerations that may force me to go for a more "complicate" structure?

See below to understand my train of thought.

According to "Stroup et al. 2018. SAS for Mixed Models. Introduction and basic Applications", it seems that AR(1) is a good simple structure for most cases (if non-negligible distance dependent within-subject correlation exists, switching from CS to AR(1) accomplishes the first 90% for accounting for within-subject correlation as a function of distance apart in time).

That said, the evaluation of the different covariance structures for my model shows the following:

It seems that ARH(1) is better than AR(1).
AR(1), as "simple" as it is, it has a large residual. Take a look at these two fit statistics for AR(1).

Generalized Chi-Square 4830671
Gener. Chi-Square / DF 7958.27

ARH(1) has better values for these two:

Generalized Chi-Square 618.88
Gener. Chi-Square / DF 1.02

However, the more complex ANTE(1) seems to be even better than the two above, judging by the AIC, AICC, BIC and log likelihood values below:

Fit Statistics
-2 Res Log Likelihood 6977.15
AIC (smaller is better) 7021.15
AICC (smaller is better) 7022.88
BIC (smaller is better) 7046.13
Generalized Chi-Square 619.29
Gener. Chi-Square / DF 1.02

Technically, ARH(1) and ANTE(1) are better than AR(1), and ANTE(1) is the best of all.

The model with AR(1) shows that the interaction Treat*Time is significant (this is my main question). This interaction is also significant with models with ARH(1) and ANTE(1).

Is it fine if I run my final model with an AR(1) covariance structure (simpler structure, but high residual variability)?

Thank you in advance for your help.

Regards,

marcel

SteveDenham · Posted 09-28-2020 09:10 AM

I am a big believer in the use of corrected AIC in selection of covariance structures. In this case it is really a clear-cut case for using ANTE(1). The next best model (AR(1) + RE) only retains 0.006% of the information in the ANTE(1).

So, the most parsimonious structure isn't always the best. Finding a best fitting structure is, at least at this point, a random search through candidate structures. You need to look at all of them, considering the processes that generate the data. One other thing to consider-the autoregressive and ante-dependence structures are contingent on the distance between time points being constant. If this is not the case, then those sort of structures should probably not be included in the options to choose from.

SteveDenham

marcel · Posted 09-28-2020 03:32 PM

Dear Steve,

Thank you for your answer. Yes, the times for the measurements are evenly spaced (fecundity was measured every 7 days per individual). The individuals were followed until they died. After they died, their fecundity was entered as missing data. For each individual I have data on fecundity (every 7 days), total fecundity (i.e., lifetime fecundity) and longevity. I am using longevity as a covariate (continuous variable).

My model is Fec = treat time treat*time longevity

There is a hypothesis on the effect of parasites on their hosts fecundity and longevity. If the effect of the parasite on the longevity of the host is very strong, then the host is expected to move its fecundity schedule to a age early in its life. For example, if a host normally lives up to 100 days, and a parasite shortens its life-span to 50 days, then the host will move the bulk of its fecundity effort to an early age in life (sometimes the "sick" host's reproductive output at that early age would be significantly higher than the fecundity of a "normal" host's at a similar age). If the parasite's effect on the longevity of the host is not strong, a compensation in the reproductive output of the host at early ages can still be expected. I am exploring the latter scenario in my work.

Anyway, by using the code:

proc corr data=Fecfinaldata_n0or_revz cov;
var fec07 fec14 fec21 fec28 fec35 fec42 fec49 fec56 fec63 fec70 fec77;
run;

I obtain the following:

The correlation among the covariances looks heterogeneous.

Regards,

marcel

SteveDenham · Posted 09-29-2020 07:09 AM

That really does look like an ANTE(1) structure. Consider a matrix where the "non-significant" correlations are set to zero (or are at least diminishing rapidly). That would result in means that are separated in time by more than about 3 periods to start looking "independent", which is what you see here. Look at the parameterization of an ANTE(1) structure in the documentation and you'll see that it builds in heterogeneity of both variance and covariance. It is like an unstructured matrix parameterized in a way that saves on the number of parameters estimated, but also shows a "ridge" that gets broader for later time periods.

Does that even make sense?

SteveDenham

marcel · Posted 10-01-2020 02:16 PM

Thanks Steve,

It seems to make sense (a least a bit).

I explored all other models and it seems that best model for my data is indeed the one with an ANTE(1) covariance structure.

My final model will include longevity as a covariate. However, longevity was measured only once for each subject during the experiment (longevity=age of death). I have a doubt whether or not longevity should be used in the interaction.

In "Hamer et al. Mixed-Up Mixed Models: Things That Look Like They Should Work But Don't, and Things That Look Like They Shouldn't Work But Do", they define the "subject-dependent covariates", as covariates that are measured once per subject, and are thus constant over the repeated measures. In the example they show in their paper, they do not include the subject-dependent covariate in the interaction. I wonder if there is any reason for that.

My model could be either this one (without longevity interaction)

ods graphics on/imagemap;
proc glimmix data=Fecfinaldata_n0or_revempty_long plots=all ;
class id tto time daugther mother family;
model fec=time|tto longevity /ddfm=kr2;
RANDOM INTERCEPT/SUBJECT=mother(family) ;
random time/subject=daughter(mother family) type=ANTE(1) residual;
run;

or this one:

With longevity interaction

ods graphics on/imagemap;
proc glimmix data=Fecfinaldata_n0or_revempty_long plots=all ;
class id tto time daughter mother family;
model fec=time|tto|longevity /ddfm=kr2;
RANDOM INTERCEPT/SUBJECT=mother(family) ;
random time/subject=daughter(mother family) type=ANTE(1) residual;
run;

Thank you for your help.

Regards,

Marcel

SteveDenham · Posted 10-13-2020 07:40 AM

Well, sorry to be late to answer, but perhaps this will help. In SAS for Mixed Models, 2nd ed., Chapter 7, there is a great discussion about the use of covariates. Sometimes the best model has an interaction, sometimes not. Suppose the F test for the interaction is significant - that implies that the slope coefficients differ depending on the level of the other factor in the interaction. This has profound implications on comparisons between least squares means in that differences between levels of the other factor are not constant, but instead are dependent on the value of the covariate. Section 7.3 walks through the steps to consider - testing for unequal slopes model, analysis if the unequal slopes model should be used, fitting the equal slopes model if it is deemed adequate, and follow-up analyses for additional information. I can't suggest anything beyond what is covered there - it is the most succinct and complete chapter in the whole book, in my opinion.

SteveDenham

marcel · Posted 10-15-2020 10:39 PM

Steve,
Thank you very much for your answer. I will take a look at the book you recgardsommend.Regards,Marcel
,

Registration is open