Analyzing experimental repeats using mixed models

1 Like

Most researchers struggle to combine dissimilar experiments. On the other hand, combining results of similar studies for the sake of statistical analysis can be simple. Mixed models can be a great option to combine similar data sets (experimental repeats) effectively.

Mixed models are ideal to combine experiments with blocking factors. For example, our experiment could have blocks in space or in time. Blocks are sources of known or likely variation, where that variation is not of primary interest to the study.

For instance, let's compare a mixed model versus a traditional ANOVA for a greenhouse study. We will use a dataset generously opened to the scientific community. It’s manuscript describes a test of Calluna (heath) plants’ response to Nitrogen and Drought tolerance. The authors repeated the study twice, over a two year period.

Traditional ANOVA

Based on the experiment's design, we want to test the effect Drought, Nitrogen, Heathland, and how they interact with each other. But are we interested in the year effect (year 1 versus year 2)? If we run PROC ANOVA, we treat all variables as fixed effects:

proc ANOVA data=Heath.data;
*/Heath.data was previously uploaded as an Excel File, and imported to a SAS-readable format.*/;
class Year Heathland Nitrogen Drought Replicate;
model 'dry weight above (g)'n=
Drought
'Year'n
Nitrogen
'Year'n*nitrogen
Heathland
'Year'n*Heathland
Heathland*Nitrogen
'Year'n*Heathland*Nitrogen
Drought*'Year'n
Drought*Nitrogen
Drought*'Year'n*nitrogen
Drought*Heathland
Drought*'Year'n*Heathland
Drought*Heathland*Nitrogen
Drought*'Year'n*Heathland*Nitrogen;
*/ “model =” specifies [fixed] effects. Because the factorial design, each test endpoint is named in addition to all factorial interactions (join by asterisks, *).
Single quotes followed by “n” denote the otherwise actionable programming script “Year” is in this case a variable name present in the source data table, as is “dry weight above ground(g)’n” in the model statement*/;
RUN;

The output table yields "Year" among several factors highly statistically significant. This means the experiments (years) sometimes showed different results. Concurrently, based on year's significant interactions, we conclude that the effects of nitrogen, drought, and the combination of the two showed different outcomes among years.

Source	DF	Anova SS	Mean Square	F Value	Pr > F
Drought	1	3.001736	3.001736	16.93	<.0001
Year	1	1424.813673	1424.813673	8037.40	<.0001
Nitrogen	1	144.159422	144.159422	813.21	<.0001
*YearNitrogen**	1	115.528100	115.528100	651.70	<.0001
Heathland	1	0.571965	0.571965	3.23	0.0746
*YearHeathland**	1	0.000000	0.000000	0.00	1.0000
*HeathlandNitrogen**	1	1.483411	1.483411	8.37	0.0044
*YearHeathlaNitroge*	1	0.000000	0.000000	0.00	1.0000
*YearDrought**	1	1.169362	1.169362	6.60	0.0112
*NitrogenDrought**	1	12.734403	12.734403	71.84	<.0001
*YearNitrogeDrought*	1	15.215862	15.215862	85.83	<.0001
*HeathlandDrought**	1	0.363152	0.363152	2.05	0.1545
*YearHeathlaDrought*	1	0.914895	0.914895	5.16	0.0246
*HeathlNitrogDrough*	1	0.000000	0.000000	0.00	1.0000
*YearHeatNitrDroug**	1	0.237733	0.237733	1.34	0.2488

Some would advocate to separate the two experiments and analyze them independently. This rational recommendation stems from major differences of outcomes among years (all those highly significant Year* interaction results). Researchers and audiences with a penchant for parsimony will benefit from this level of detail.

Mixed Model

On the other hand, what if we aren’t interested in the effect of year, and only interested in the effect of drought, nitrogen and heathland? Let’s combine the experiments treating year as a random effect in PROC MIXED:

proc mixed data=Heath.data;
*/Everything in this code is the same, except for “ANOVA” changed to “Mixed, and year is taken from the Model Statement and placed in a new Random Statement.*/;
class Year Heathland Nitrogen Drought Replicate;
model 'dry weight above (g)'n=
Drought
Nitrogen
Drought*nitrogen
Heathland
Heathland*Drought
Heathland*Nitrogen
Heathland*Drought*Nitrogen;
random 'Year'n;
*/the random statement specifies the blocking factors, in this case the year.*/;
RUN;

The output culminates in Type 3 tests of fixed effects, which we interpret like the PROC ANOVA results.

Effect	Num DF	Den DF	F Value	Pr > F
Drought	1	132	3.41	0.0669
Nitrogen	1	132	127.89	<.0001
*NitrogenDrought**	1	132	13.71	0.0003
Heathland	1	132	0.29	0.5932
*HeathlandDrought**	1	132	0.60	0.4411
*HeathlandNitrogen**	1	132	0.72	0.3961
*HeathlNitrogDrough*	1	132	0.00	0.953

PROC MIXED benefited us because it allowed us to generalize across the two experiments. Even though experiments differ substantially, we are still able to make broad conclusions about the things we care about. This comes with risks, as we know the year had an effect, and that may hold the key to valuable information. Additionally, we took the naughty tack of not testing ANOVA assumptions before sallying toward interpretation.

We've only scratched the surface of PROC MIXED. SAS shines impressively in the diversity and versatility of its mixed model procedures. See how your colleagues apply SAS mixed models in their research by searching your favorite manuscript database for "SAS," "random factor" and a relevant keyword (try "maize," or even "heath").