Solved: Re: Baseline Covariate + Repeated Measure Analysis for Split-Plot

bohm0072 · Posted 05-29-2017 11:44 AM

Hi All,

I am a bit puzzled on how to carry out the analysis I am hoping to. A bit about my study and data design, as well as what I have been able to successfully accomplish so far and where I think I am getting stuck:

This data is from an agronomy field study - RCBD w/ split-plots and repeated mesures. I have 4 blocks ("Rep"), with 2 Irrigation treatments ("Irr_Trt") as the whole plot factors and 6 Nitrogen treatments ("N_Trt") as the split-plot factors. The response variable I am interested in measuring is nitrate concentration ("Concentration_NO3") in the soil water solution which is measured on 25 dates ("Sample_Date") over the course of the experiement (which is a single growing season); the response variable also has a lognormal distribution (beacuse concentrations cannot be less than zero...). I have a model that I feel very comfortable with, and believe that it is appropriate for the analysis of my data. The covariance structure I chose to use is the R-side CS + AR(1) type; I am by no means an GLMM expert, but after combing through jusst about everything Walter Stroup (and also among many other experts) has written, this seems to be the best fit for my data.

proc glimmix
nobound
data=work.B16
class Irr_Trt N_Trt Sample_Date Rep;
model Concentration_NO3 = Irr_Trt|N_Trt|Sample_Date/
dist=lognormal
ddfm=kenwardroger2;
random intercept Irr_Trt N_Trt*Irr_Trt/subject=Rep;
random Sample_Date/subject = N_Trt*Irr_Trt*Rep type=ar(1) residual;
run;
quit;

Now for the part where things get a bit more challenging for me... The CV for this data is quite large, ~50%, and I have been digging deeper into the data set itself for insights. One thing that escaped my initital review of the data, is that the first 4 dates when samples were collected occured prior to the imposition of either the irrigation or nitrogen treatments. In this sense, I have a initial value covariate ("NO3cov"), measured for each experimental unit (i.e. the split-plot or N_Trt*Irr_Trt*Rep unit). I believe that incorporation of this covariate (as a correction for initial differences in nitrate concentration) will resolve some of the unexplained variance in the data.

My plan is to re-run the analysis, by restricting my model to the data collected after 1 June 2016 (which was the date when treatments were initially imposed), and using the mean values of nitrate concentration for each experimental unit colelcted prior to that date as baseline covariates. This leaves 21 dates for inclusion in the repeated measures analysis, with the first 4 sample dates used as covariates. To start, I did some simple linear regressions of the measured nitrate concentration against the covariate for each sampling date and for the mean concentration over all sampling dates. There was a positive and significant relationship between the covariate and measured nitrate concentration for the mean values over the whole season; looking at each sampling date, there was a positive and significant relationship only for the first 6 sampling dates after the imposition of the treatments while the final 15 sampling dates did not have a signifiacnt relationship with the covariate.

Using this insight (and an example from SAS for Mixed Models (Littell et al. p. 186), here is the model I am now attempting to run (with changes from previous model in bold😞

proc glimmix
nobound
data=work.B16
class Irr_Trt N_Trt Sample_Date Rep;
model Concentration_NO3 = Irr_Trt|N_Trt|Sample_Date NO3cov NO3cov*Sample_Date/
dist=lognormal
ddfm=kenwardroger2;
random intercept Irr_Trt N_Trt*Irr_Trt/subject=Rep;
random Sample_Date/subject = N_Trt*Irr_Trt*Rep type=ar(1) residual;
run;
quit;

In this analysis, I found that the NO3cov main effect was not significant (p(>F) = 0.27) but that the NO3cov*Sample_Date interaction effect was significant (p(>F) = 0.02). The AICC, however, increased from 1224 to 1300 with the inclusion of the covariate.

To date, I haven't seen a discussion on how to include a baseline covariate within a repeated measures and split-plot design. Having somewhat limited experience, I wanted to gather some feedback on what insights others might have on the analysis I am conducting. In particular, my outstanding questions and concerns are:

1) Is the baseline covariate parameterized correctly in the model statement? Do I need both the NO3 cov and NO3cov*Sample_Date to account for the fact that the relationship of the response variable to the covariate changes over time?
2) Do I need to change or include an additional random statement when I add the covariate? will the df be correct for the covariate effect with the specification that I have?
3) How does the inculsion of the covariate affect the existing repeated measures covariance structure?
4) Is this even an appropriate use of a covariate?
5) Is it a "problem" that the main effect of NO3cov is not significant while the interaction effect of NO3cov*Sample_Date is? What about the increase in AICC value when the covariate was inculded?

Thanks!
Brian

sld · Posted 06-01-2017 12:57 AM

Kudos for a well-crafted question. I wish I had the answers to all of your questions.

1) Is the baseline covariate parameterized correctly in the model statement? Do I need both the NO3 cov and NO3cov*Sample_Date to account for the fact that the relationship of the response variable to the covariate changes over time?

I can see the value in using (the mean of the first four dates of) NO3cov as a covariate. I think it would make sense to use log(NO3cov) if you are using a lognormal distribution for Concentration_NO3. The model assumes a linear relationship between the link-scale of the response and the covariate.

NO3cov*Sample_Date allows the slope of the linear regression of Concentration_NO3 (on the link scale) to vary by Sample_Date. You should retain NO3cov if the interaction is in the model statement.

2) Do I need to change or include an additional random statement when I add the covariate? will the df be correct for the covariate effect with the specification that I have?

Adding a covariate into a mixed model is an appreciable complication, depending on the hierarchical level at which the covariate is measures (here, Rep*Irr_Trt*N_Trt, aka subplot); check out random coefficient models in the Littell et al. text (SAS for Mixed Models, 2nd ed), Stroup (Generalized Linear Mixed Models), and Milliken and Johnson (Analysis of Messy Data, Vol III Analysis of Covariance).

I always go into a model with some idea of what I think are appropriate denom df, just to check.

3) How does the inculsion of the covariate affect the existing repeated measures covariance structure?

Not sure about that.

4) Is this even an appropriate use of a covariate?

Google "change in baseline ancova".

5) Is it a "problem" that the main effect of NO3cov is not significant while the interaction effect of NO3cov*Sample_Date is? What about the increase in AICC value when the covariate was inculded?

Nope. Google "ancova centering". This text

https://books.google.com/books/about/Multiple_Regression.html?id=LcWLUyXcmnkC

has a nice introductory level discussion of centering.

Regarding the AICC increasing with the inclusion of the covariate: To compare models that differ in fixed effects factor, you need to use a true maximum likelihood method (LAPLACE or QUAD), rather than REML.

There are many possible different bells and whistles to be considered. Is the relationship linear? Does the slope vary with Irr_Trt and/or N_Trt? Is the slope constant for all whole plots, or does it vary?

Stepping back further, I would ask: What is your research question about sampling dates? 21 or 25 levels is a lot to sort out in an ANOVA context. Why did you make all these observations? Are you actually interested in comparing means among all 21 or 25? Would it make sense to extract some agronomically meaningful statistic from all these measurements to use as a response, or to regress on sampling date? Is there a statistician at your institution that you could work with, or a statistics prof, or a statistical consulting center? (I know, there may not be anyone, or at least anyone that knows more than you.) Lots of things to ponder, and a good stat colleague would be a more useful thing than this forum.

View solution in original post

sld · Posted 06-01-2017 12:57 AM