08-05-2017 03:49 PM - edited 08-05-2017 09:43 PM
Greetings all. I'm having some issues analyzing data from an agricultural study we are conducting. The study is a split-plot design in which the whole-plot is irrigation and the split is graft- we were looking at the effect of grafting and irrigation on tomato fruit production. We harvested a total of 16 times in 2016 and 12 times in 2017. I'm more familiar with analyses in PROC MIXED but am trying to move over to GLIMMIX. Were I to analyze this study in MIXED it would look something like this:
class Harvest Plot Irrigation Graft Block Year;
model awtrun = Graft|Irrigation|Harvest/ ddfm=kr residual;
random Year Block(Year) Irrigation*Block(Year);
repeated/ type = AR(1) subject= plot;
Plot is simply the unique plot ID, it would be the same as Graft*Irrigation*Block*year
I've seen other posts answering how to do a repeated measures analysis with glimmix, however I am unsure how to do it with years combined.
Any help would be greatly appreciated.
08-05-2017 08:15 PM - edited 08-05-2017 08:18 PM
Let me see if I understand your experimental design. I am making parts of this up, so correct my description as need be.
You have two years, and you are thinking of year as a random effects factor.
You have some (unspecified) number of "blocks" within each year, and you are using different blocks in different years.
Within each block, you have one "superplot" (the whole plot unit, and a random effects factor) randomly assigned to each level of the fixed effects factor irrigation (the whole plot treatment). (This is a split-plot with whole plots in blocks, as opposed to a split-plot with whole plots in a completely randomized design.)
You have "plots" within each whole plot unit, one plot randomly assigned to each level of the fixed effects factor graft.
You measure tomato fruit production multiple times in each plot (16 times in one year, 12 times in the other).
1. When I'm sussing out a mixed model, I find it extremely helpful to distinguish between random effects factors (i.e., experimental units) and their associated fixed effects factors (i.e., experimental treatments). For example, the whole plot unit is "superplot" and the whole plot treatment is "irrigation". I deliberately give different names to the unit and the treatment because they are not the same thing. Just saying "whole plot" does not provide sufficient information.
2. With only two levels, year does not make a very good random effects factor. The model attempts to estimate a variance among years based on the most minimal of data (n=2). For another thing, if year is random and if blocks are nested within years (i.e., different blocks in different years), then one can argue that the inference space for your study is defined by years--year becomes the fundamental replicating factor. With only two years, this is clearly problematic. I'd ponder incorporating year as a fixed effects factor, (randomly?) assigned to blocks. Iit's not always a black-and-white decision.
3. With 16 levels of harvest in one year and 12 levels of harvest in the second year, and if harvest is a classification fixed effects factor, then you have a problem, regardless of whether you use MIXED or GLIMMIX. The irrigation x graft x harvest factorial is incomplete: you do not have data for all combinations of these factors. Consequently, if you include interactions with harvest in your model, some lsmeans will be reported as "non-est" (not estimable); and main effect means for irrigation and graft will be biased regardless of whether interactions are included. You could extract a subset of harvest dates that "match" between the two years (although this exercise may be too subjective to justify adequately). Or you could incorporate harvest as a continuous factor, and regress awtrun on harvest level (i.e., date), which poses additional analysis considerations (e.g., linearity? random slopes?). Or you could combine data over harvests, for example total production for the season and then drop harvest as a factor in the model. Or you could analyze each year separately. Or some other approach that makes sense to you.
4. If response data are normally distributed (conditional on the predictors), then MIXED and GLIMMIX will produce the same results. Syntax-wise, the REPEATED statement in MIXED is replaced by the RANDOM / RESIDUAL statement in GLIMMIX. If you are switching to GLIMMIX to deal with non-normal data, then some modifications to the random parts of the model may be necessary, for example to accommodate overdispersion.
08-06-2017 09:35 AM
Thank you for your input, sld.
In regards to treating year as a fixed or random effect: I do realize that two years is not a large sample for estimating the variance between them. When we conduct these studies we repeat in time as a means to determine the stability of our treatment effects (in this case, graft and irrigation). Furthermore, I have conducted a prior analysis with year as a fixed effect - none of the interactions with year (graft x year, irrigation x year, graft x irrigation x year) nor the main effect of year were significant. Thus, instead of keeping it as a fixed effect, we make it random, thus broadening our inferences on both graft and irrigation outside of 2016 and 2017.
We do an analysis looking at the total yield at the end of the study. The repeated measures analysis is helpful, especially for a grower, as it can show how plants yield throughout the growing and marketing season. When we did the repeated measures analysis last year, we found that graft X Harvest was significant. From there we modeling this interaction by modeling harvest as a continuous factor.
Thanks again for your help.
08-06-2017 12:39 PM
Thank you for your response. Clearly you've already been thinking about the issues I wrote about.
I'll elaborate a bit about why I suggest flexibility in dealing with field studies that span few years. We would like to think of a random year factor as providing us with a basis for inference to years in general. But in the absence of a time machine, we cannot obtain a random sample of years from the statistical population to which we want to make inference--so, in practice, we don't really have a basis for temporal inference, on top of which we typically have very few years of data. Field experiments definitely need multiple years because we have scant control over environmental conditions and we know that environmental conditions matter. Philosophically I think of multiple years as exercises in "repeatability" rather than "replication", which segues into metareplication sensu Johnson https://dl.sciencesocieties.org/publications/cs/abstracts/46/6/2486 .
This paper http://onlinelibrary.wiley.com/doi/10.2307/1941729/epdf provides a practical discussion of whether year is fixed or random, and this paper touches on that topic and its inferential implications https://www.jstor.org/stable/3546293
In your initial post, you were seeking input on GLIMMIX code. If you are still interested, post the code that you have so far and the community can go from there.
Good luck, and have fun!