The field study gold standard is randomized complete block designs (RCBD). However, blocking may not always be the best choice, and smaller tests especially benefit from the completely randomized design (CRD) alternative.
How would you code for a multisite CRD test?
Our hypothetical dataset is a multi-site corn trial that tested the effect of a pre-plant nitrogen stabilizer on yield. There were eight sites across US corn growing regions, and each site had four replicates of the treatments. All plots were treated the same except one group received a N stabilizer treatment, and the other didn’t.
We’ve uploaded and imported our dataset into the work library under the filename CRD. We are interested to partition the effect of Nitrogen Stabilizer and Location. The location effect will almost certainly be larger, given huge variations in corn yields across the country. We also input the symbol to test for factorial interaction (vertical bar).
Note we are using PROC MIXED. We chose eight sites because we had the capability to perform field tests in those locations, not because we have primary interest in those fields. So we consider those locations as random factors.
proc mixed data=work.crd cl plots=all; class 'Location'n N_Stabilizer; model 'Yield_(tonnes/ha)'n='Location'n|N_Stabilizer / ddfm=kr2; random int / subject = 'Location'n; run;
We run the code and find not only that the result is not statistically significant (p>0.05; Type 3 fixed effects), but also the residuals does not accord to a normal distribution. We could either rethink our model type (see this discussion, for example), or we can extricate salient things diminishing normality.
Turns out that two sites were associated with a massive degree of residual variance. We were able to learn that by turning on the plots=all option). Maybe they were hit by natural disasters, the fertilizer calculations were off, the plot combines malfunctioned, or the datasets are riddled with entry errors. In any case, they violate our assumptions of ANOVA (normal distribution of residuals and homoscedasticity).
Let’s use a data step to create a new data set that excludes (location not equals, ne) Brenham, TX and Springfield, MO sites.
data work.crd2; set work.crd; if 'Location'n ne 'Brenham_TX'; if'Location'n ne 'Springfield_MO'; run;
Then, rerun the model:
proc mixed data=work.crd2 cl plots=all; class 'Location'n N_Stabilizer; model 'Yield_(tonnes/ha)'n='Location'n|N_Stabilizer / ddfm=kr2; random int / subject = 'Location'n; run;
Turns out the residual plots are much better (a better scatter in the quantile plot). We see there are no significant effects of Nitrogen Stabilizer and the location interaction with nitrogen stabilizer.
If interaction terms are not statistically significant, they may be eliminated. If we do that (replace the vertical bar with a space) and rerun the model, we see stronger support for the effect of the nitrogen stabilizer type 3 fixed effects, but only very marginal (p=0.142).
How might our design have affected this result? While we did a nice job of distributing sites across the corn belt, the root cause of the high yield variances at the two sites should be investigated. In our design, we might have added more replicates, more sites, or done a better job to internalize field variation through blocking.
Learn more about PROC MIXED
Consider taking this excellent Mixed Model eLearning Course
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.