Hello Everyone,
I have been trying to do some statistical analysis of soil data from 4 different locations.
The experimental design is split-plot design and has three(3) factors; tillage (2 levels)d as main plot factor, and crop residue (2 levels)d and nitrogen dose (3 levels) as sub plot factor randomized within the main plots. The whole plot is replicated 4 times.
Now I have some soil parameters , e.g., soil organic carbon (SOC) and this SOC is determined for 2 different soil depths (0-20cm and 20-40cm).
The dataframe that I made for this approach has columns or headers as : replication, depth, tillage, CR, N, SOC.
SO far I understand that I need to use proc mixed model and I used the following codes to to do it:
proc mixed;
class tillage CR N replication ;
model SOC= tillage |CR |N ;
random replication replication (tillage);
lsmeans tillage|CR |N/ alpha=0.01 diff adjust=simulate ;
run;
But I am struggling to fit the co variate "depth(2 levels)" in the model as well as to make a multi group comparison with letters.
I would be grateful if anyone could take a time out of his/her busy schedule and suggest me the best solution.
Thanks in advance.
Your model in your second post specifies a 3-way factorial in a randomized block design, but in your first post you described your design as a split-plot with (I think, I'm not entirely sure) whole plots in blocks. So this second model specification does not appear to match the design. Your model in your first post corresponds (I think) to your actual design.
You can do a lot of exploration and testing of lsmeans within the LSMEANS statement in either the MIXED or GLIMMIX procedure. I've illustrated some of these options in my code below; refer to the documentation for more details about what these options do. Using the DIFF option on the LSMEANS statement does not get you very far in interpretation of interactions; as I noted before, a thoughtful, context-specific approach is better. I've used GLIMMIX rather than MIXED to get interaction plots and to use SLICEDIFF, but otherwise the LSMEANS options I've used are available in MIXED.
data test;
input Rep Tillage$ Crop_Residue$ N_Dose$ SOC;
datalines;
1 Contour with N0PK 30256.7
2 Contour with N0PK 16870.24
3 Contour with N0PK 26706.44
4 Contour with N0PK 23508.73
1 Contour with N120PK 24466.53
2 Contour with N120PK 24032.51
3 Contour with N120PK 26145.86
4 Contour with N120PK 23643.35
1 Contour with N60PK 27335.54
2 Contour with N60PK 23438.42
3 Contour with N60PK 27678.23
4 Contour with N60PK 26230.67
1 Contour without N0PK 23386.71
2 Contour without N0PK 18918.63
3 Contour without N0PK 27250.83
4 Contour without N0PK 23500.25
1 Contour without N120PK 24153.19
2 Contour without N120PK 23274.78
3 Contour without N120PK 23882.67
4 Contour without N120PK 21778.33
1 Contour without N60PK 27734.6
2 Contour without N60PK 19916.57
3 Contour without N60PK 25009.3
4 Contour without N60PK 22351.28
1 Reduced with N0PK 23766.65
2 Reduced with N0PK 30368.58
3 Reduced with N0PK 29149.02
4 Reduced with N0PK 27259.67
1 Reduced with N120PK 28545.35
2 Reduced with N120PK 28205.5
3 Reduced with N120PK 29953.75
4 Reduced with N120PK 28287.69
1 Reduced with N60PK 32671.92
2 Reduced with N60PK 25906.4
3 Reduced with N60PK 25047.6
4 Reduced with N60PK 27056.77
1 Reduced without N0PK 25677.25
2 Reduced without N0PK 23565.46
3 Reduced without N0PK 26469.71
4 Reduced without N0PK 28219.43
1 Reduced without N120PK 29238.49
2 Reduced without N120PK 24391.22
3 Reduced without N120PK 28182.27
4 Reduced without N120PK 27081.03
1 Reduced without N60PK 28364.79
2 Reduced without N60PK 27046.47
3 Reduced without N60PK 25500.1
4 Reduced without N60PK 24558.67
run;
/* Rename n_dose levels to be quantitative */
data test;
set test;
if n_dose = "N0PK" then n_level = 0;
else if n_dose = "N120PK" then n_level = 120;
else if n_dose = "N60PK" then n_level = 60;
run;
/* Plot observed data, for visual assessment of various things */
proc sort data=test out=test_out;
by n_level;
proc sgpanel data=test_out;
panelby tillage crop_residue / columns=2;
series x=n_level y=soc / group=rep markers;
run;
/* ASSUMING a 3-way full factorial treatment structure
in a split-plot design with whole plots in blocks */
proc glimmix data=test
plots=(studentpanel boxplot(fixed));
class rep tillage crop_residue n_dose;
model soc = tillage | crop_residue | n_dose;
random rep /* block variance */
rep*tillage ; /* whole plot variance */
/* residual is subplot variance */
lsmeans tillage | crop_residue | n_dose;
/* PRETEND that tillage*n_dose interaction is significant */
lsmeans tillage*n_dose / plot=meanplot(sliceby=tillage join cl) /* interaction plot, not available in Proc MIXED */
slice=(tillage n_dose) /* simple effects tests */
slicediff=tillage adjust=tukey; /* pairwise comparisons among n_dose means adjusted for Type I error within each level of tillage */
run;
I hope this is of help.
I would think of each subplot being divided into two sub-subplots, with each level of depth being "randomly" assigned to a sub-subplot. The model now is a split-split plot design:
proc mixed;
class tillage CR N replication depth ;
model SOC= tillage | CR | N | depth ;
random replication(tillage) cr*n*replication(tillage);
lsmeans tillage CR N / alpha=0.01 diff adjust=simulate ;
lsmeans tillage | cr | n | depth;
run;
Note that I removed "replication" from RANDOM because based on your description, the whole plots are not clustered in blocks. "replication(tillage)" is the whole plot variance. "cr*n*replication(tillage)" is the subplot variance. The sub-subplot variance is residual.
I do not recommend applying Type I error adjustment (like the simulate method) to interaction means. (I am fine with it for main effects.) The adjustment will account for ALL pairwise comparisons among the interaction means, most of which you have no interest in. Adjusting for Type I error increases Type II error (and so, reduces power) and to control for more comparisons than necessary is an unnecessary reduction in power--unless you care only about the risk of Type I error and you care nothing about the risk of Type II error. For most of us, I seriously doubt that is the case.
In addition, interaction generally is more complicated than mere pairwise comparisons. I think you have to take a more thoughtful and context-specific approach to interpretation of interaction. My usual approach is to identify post-hoc contrasts that are pertinent in the context of the study and apply Type I error control to those. But that's my opinion, and yours may differ.
Hope this helps.
Edit: It could be that "location" is a block, and that "location" is the same as "replication", in which would require a different RANDOM statement. The design is not clear from your description.
Hi sld,
Firstly, I would like to express my sincere gratitude for your kind response. I just used your suggested codes and I realized that the interactions are getting quite complicated and even more tough while interpreting the result. I, therefore, decided to opt out the "depth" variable and consider the value for the total depth. Then I thought how I could fit the "random" effect in this case. For clarification, I hereby attach part of my dataframe.
Rep | Tillage | Crop_Residue | N_Dose | SOC |
1 | Contour | with | N0PK | 30256.7 |
2 | Contour | with | N0PK | 16870.24 |
3 | Contour | with | N0PK | 26706.44 |
4 | Contour | with | N0PK | 23508.73 |
1 | Contour | with | N120PK | 24466.53 |
2 | Contour | with | N120PK | 24032.51 |
3 | Contour | with | N120PK | 26145.86 |
4 | Contour | with | N120PK | 23643.35 |
1 | Contour | with | N60PK | 27335.54 |
2 | Contour | with | N60PK | 23438.42 |
3 | Contour | with | N60PK | 27678.23 |
4 | Contour | with | N60PK | 26230.67 |
1 | Contour | without | N0PK | 23386.71 |
2 | Contour | without | N0PK | 18918.63 |
3 | Contour | without | N0PK | 27250.83 |
4 | Contour | without | N0PK | 23500.25 |
1 | Contour | without | N120PK | 24153.19 |
2 | Contour | without | N120PK | 23274.78 |
3 | Contour | without | N120PK | 23882.67 |
4 | Contour | without | N120PK | 21778.33 |
1 | Contour | without | N60PK | 27734.6 |
2 | Contour | without | N60PK | 19916.57 |
3 | Contour | without | N60PK | 25009.3 |
4 | Contour | without | N60PK | 22351.28 |
1 | Reduced | with | N0PK | 23766.65 |
2 | Reduced | with | N0PK | 30368.58 |
3 | Reduced | with | N0PK | 29149.02 |
4 | Reduced | with | N0PK | 27259.67 |
1 | Reduced | with | N120PK | 28545.35 |
2 | Reduced | with | N120PK | 28205.5 |
3 | Reduced | with | N120PK | 29953.75 |
4 | Reduced | with | N120PK | 28287.69 |
1 | Reduced | with | N60PK | 32671.92 |
2 | Reduced | with | N60PK | 25906.4 |
3 | Reduced | with | N60PK | 25047.6 |
4 | Reduced | with | N60PK | 27056.77 |
1 | Reduced | without | N0PK | 25677.25 |
2 | Reduced | without | N0PK | 23565.46 |
3 | Reduced | without | N0PK | 26469.71 |
4 | Reduced | without | N0PK | 28219.43 |
1 | Reduced | without | N120PK | 29238.49 |
2 | Reduced | without | N120PK | 24391.22 |
3 | Reduced | without | N120PK | 28182.27 |
4 | Reduced | without | N120PK | 27081.03 |
1 | Reduced | without | N60PK | 28364.79 |
2 | Reduced | without | N60PK | 27046.47 |
3 | Reduced | without | N60PK | 25500.1 |
4 | Reduced | without | N60PK | 24558.67 |
I used the following codes to make the analysis. I request you kindly to have a look at my approach. I look forward to getting your valuable suggestion.
PROC MIXED;
CLASS Tillage Crop_Residue N_Dose Rep;
MODEL SOC = Tillage Crop_Residue N_Dose
Tillage*Crop_Residue Tillage*N_Dose Crop_Residue*N_Dose Tillage*Crop_Residue*N_Dose / DDFM=SATTERTH;
RANDOM Rep;
LSMEANS Tillage Crop_Residue N_Dose Tillage*Crop_Residue Tillage*N_Dose Crop_Residue*N_Dose Tillage*Crop_Residue*N_Dose /DIFF;
PARMS / NOBOUND;
RUN;
Your model in your second post specifies a 3-way factorial in a randomized block design, but in your first post you described your design as a split-plot with (I think, I'm not entirely sure) whole plots in blocks. So this second model specification does not appear to match the design. Your model in your first post corresponds (I think) to your actual design.
You can do a lot of exploration and testing of lsmeans within the LSMEANS statement in either the MIXED or GLIMMIX procedure. I've illustrated some of these options in my code below; refer to the documentation for more details about what these options do. Using the DIFF option on the LSMEANS statement does not get you very far in interpretation of interactions; as I noted before, a thoughtful, context-specific approach is better. I've used GLIMMIX rather than MIXED to get interaction plots and to use SLICEDIFF, but otherwise the LSMEANS options I've used are available in MIXED.
data test;
input Rep Tillage$ Crop_Residue$ N_Dose$ SOC;
datalines;
1 Contour with N0PK 30256.7
2 Contour with N0PK 16870.24
3 Contour with N0PK 26706.44
4 Contour with N0PK 23508.73
1 Contour with N120PK 24466.53
2 Contour with N120PK 24032.51
3 Contour with N120PK 26145.86
4 Contour with N120PK 23643.35
1 Contour with N60PK 27335.54
2 Contour with N60PK 23438.42
3 Contour with N60PK 27678.23
4 Contour with N60PK 26230.67
1 Contour without N0PK 23386.71
2 Contour without N0PK 18918.63
3 Contour without N0PK 27250.83
4 Contour without N0PK 23500.25
1 Contour without N120PK 24153.19
2 Contour without N120PK 23274.78
3 Contour without N120PK 23882.67
4 Contour without N120PK 21778.33
1 Contour without N60PK 27734.6
2 Contour without N60PK 19916.57
3 Contour without N60PK 25009.3
4 Contour without N60PK 22351.28
1 Reduced with N0PK 23766.65
2 Reduced with N0PK 30368.58
3 Reduced with N0PK 29149.02
4 Reduced with N0PK 27259.67
1 Reduced with N120PK 28545.35
2 Reduced with N120PK 28205.5
3 Reduced with N120PK 29953.75
4 Reduced with N120PK 28287.69
1 Reduced with N60PK 32671.92
2 Reduced with N60PK 25906.4
3 Reduced with N60PK 25047.6
4 Reduced with N60PK 27056.77
1 Reduced without N0PK 25677.25
2 Reduced without N0PK 23565.46
3 Reduced without N0PK 26469.71
4 Reduced without N0PK 28219.43
1 Reduced without N120PK 29238.49
2 Reduced without N120PK 24391.22
3 Reduced without N120PK 28182.27
4 Reduced without N120PK 27081.03
1 Reduced without N60PK 28364.79
2 Reduced without N60PK 27046.47
3 Reduced without N60PK 25500.1
4 Reduced without N60PK 24558.67
run;
/* Rename n_dose levels to be quantitative */
data test;
set test;
if n_dose = "N0PK" then n_level = 0;
else if n_dose = "N120PK" then n_level = 120;
else if n_dose = "N60PK" then n_level = 60;
run;
/* Plot observed data, for visual assessment of various things */
proc sort data=test out=test_out;
by n_level;
proc sgpanel data=test_out;
panelby tillage crop_residue / columns=2;
series x=n_level y=soc / group=rep markers;
run;
/* ASSUMING a 3-way full factorial treatment structure
in a split-plot design with whole plots in blocks */
proc glimmix data=test
plots=(studentpanel boxplot(fixed));
class rep tillage crop_residue n_dose;
model soc = tillage | crop_residue | n_dose;
random rep /* block variance */
rep*tillage ; /* whole plot variance */
/* residual is subplot variance */
lsmeans tillage | crop_residue | n_dose;
/* PRETEND that tillage*n_dose interaction is significant */
lsmeans tillage*n_dose / plot=meanplot(sliceby=tillage join cl) /* interaction plot, not available in Proc MIXED */
slice=(tillage n_dose) /* simple effects tests */
slicediff=tillage adjust=tukey; /* pairwise comparisons among n_dose means adjusted for Type I error within each level of tillage */
run;
I hope this is of help.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.