Most researchers struggle to combine dissimilar experiments. On the other hand, combining results of similar studies for the sake of statistical analysis can be simple. Mixed models can be a great option to combine similar data sets (experimental repeats) effectively.
Mixed models are ideal to combine experiments with blocking factors. For example, our experiment could have blocks in space or in time. Blocks are sources of known or likely variation, where that variation is not of primary interest to the study.
For instance, let's compare a mixed model versus a traditional ANOVA for a greenhouse study. We will use a dataset generously opened to the scientific community. It’s manuscript describes a test of Calluna (heath) plants’ response to Nitrogen and Drought tolerance. The authors repeated the study twice, over a two year period.
Based on the experiment's design, we want to test the effect Drought, Nitrogen, Heathland, and how they interact with each other. But are we interested in the year effect (year 1 versus year 2)? If we run PROC ANOVA, we treat all variables as fixed effects:
proc ANOVA data=Heath.data;
*/Heath.data was previously uploaded as an Excel File, and imported to a SAS-readable format.*/;
class Year Heathland Nitrogen Drought Replicate;
model 'dry weight above (g)'n=
Drought
'Year'n
Nitrogen
'Year'n*nitrogen
Heathland
'Year'n*Heathland
Heathland*Nitrogen
'Year'n*Heathland*Nitrogen
Drought*'Year'n
Drought*Nitrogen
Drought*'Year'n*nitrogen
Drought*Heathland
Drought*'Year'n*Heathland
Drought*Heathland*Nitrogen
Drought*'Year'n*Heathland*Nitrogen;
*/ “model =” specifies [fixed] effects. Because the factorial design, each test endpoint is named in addition to all factorial interactions (join by asterisks, *).
Single quotes followed by “n” denote the otherwise actionable programming script “Year” is in this case a variable name present in the source data table, as is “dry weight above ground(g)’n” in the model statement*/;
RUN;
The output table yields "Year" among several factors highly statistically significant. This means the experiments (years) sometimes showed different results. Concurrently, based on year's significant interactions, we conclude that the effects of nitrogen, drought, and the combination of the two showed different outcomes among years.
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
Drought |
1 |
3.001736 |
3.001736 |
16.93 |
<.0001 |
Year |
1 |
1424.813673 |
1424.813673 |
8037.40 |
<.0001 |
Nitrogen |
1 |
144.159422 |
144.159422 |
813.21 |
<.0001 |
Year*Nitrogen |
1 |
115.528100 |
115.528100 |
651.70 |
<.0001 |
Heathland |
1 |
0.571965 |
0.571965 |
3.23 |
0.0746 |
Year*Heathland |
1 |
0.000000 |
0.000000 |
0.00 |
1.0000 |
Heathland*Nitrogen |
1 |
1.483411 |
1.483411 |
8.37 |
0.0044 |
Year*Heathla*Nitroge |
1 |
0.000000 |
0.000000 |
0.00 |
1.0000 |
Year*Drought |
1 |
1.169362 |
1.169362 |
6.60 |
0.0112 |
Nitrogen*Drought |
1 |
12.734403 |
12.734403 |
71.84 |
<.0001 |
Year*Nitroge*Drought |
1 |
15.215862 |
15.215862 |
85.83 |
<.0001 |
Heathland*Drought |
1 |
0.363152 |
0.363152 |
2.05 |
0.1545 |
Year*Heathla*Drought |
1 |
0.914895 |
0.914895 |
5.16 |
0.0246 |
Heathl*Nitrog*Drough |
1 |
0.000000 |
0.000000 |
0.00 |
1.0000 |
Year*Heat*Nitr*Droug |
1 |
0.237733 |
0.237733 |
1.34 |
0.2488 |
Some would advocate to separate the two experiments and analyze them independently. This rational recommendation stems from major differences of outcomes among years (all those highly significant Year* interaction results). Researchers and audiences with a penchant for parsimony will benefit from this level of detail.
On the other hand, what if we aren’t interested in the effect of year, and only interested in the effect of drought, nitrogen and heathland? Let’s combine the experiments treating year as a random effect in PROC MIXED:
proc mixed data=Heath.data;
*/Everything in this code is the same, except for “ANOVA” changed to “Mixed, and year is taken from the Model Statement and placed in a new Random Statement.*/;
class Year Heathland Nitrogen Drought Replicate;
model 'dry weight above (g)'n=
Drought
Nitrogen
Drought*nitrogen
Heathland
Heathland*Drought
Heathland*Nitrogen
Heathland*Drought*Nitrogen;
random 'Year'n;
*/the random statement specifies the blocking factors, in this case the year.*/;
RUN;
The output culminates in Type 3 tests of fixed effects, which we interpret like the PROC ANOVA results.
Effect |
Num DF |
Den DF |
F Value |
Pr > F |
Drought |
1 |
132 |
3.41 |
0.0669 |
Nitrogen |
1 |
132 |
127.89 |
<.0001 |
Nitrogen*Drought |
1 |
132 |
13.71 |
0.0003 |
Heathland |
1 |
132 |
0.29 |
0.5932 |
Heathland*Drought |
1 |
132 |
0.60 |
0.4411 |
Heathland*Nitrogen |
1 |
132 |
0.72 |
0.3961 |
Heathl*Nitrog*Drough |
1 |
132 |
0.00 |
0.953 |
PROC MIXED benefited us because it allowed us to generalize across the two experiments. Even though experiments differ substantially, we are still able to make broad conclusions about the things we care about. This comes with risks, as we know the year had an effect, and that may hold the key to valuable information. Additionally, we took the naughty tack of not testing ANOVA assumptions before sallying toward interpretation.
We've only scratched the surface of PROC MIXED. SAS shines impressively in the diversity and versatility of its mixed model procedures. See how your colleagues apply SAS mixed models in their research by searching your favorite manuscript database for "SAS," "random factor" and a relevant keyword (try "maize," or even "heath").
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.