Solved: Proc Mixed model and multigroup comparison

eshad002 · Posted 12-06-2017 07:08 AM

Hello Everyone,

I have been trying to do some statistical analysis of soil data from 4 different locations.

The experimental design is split-plot design and has three(3) factors; tillage (2 levels)d as main plot factor, and crop residue (2 levels)d and nitrogen dose (3 levels) as sub plot factor randomized within the main plots. The whole plot is replicated 4 times.

Now I have some soil parameters , e.g., soil organic carbon (SOC) and this SOC is determined for 2 different soil depths (0-20cm and 20-40cm).

The dataframe that I made for this approach has columns or headers as : replication, depth, tillage, CR, N, SOC.

SO far I understand that I need to use proc mixed model and I used the following codes to to do it:

proc mixed;
class  tillage CR N replication ;
model SOC=  tillage |CR |N ;
random  replication replication (tillage);
lsmeans tillage|CR |N/ alpha=0.01 diff adjust=simulate ;
run;

But I am struggling to fit the co variate "depth(2 levels)" in the model as well as to make a multi group comparison with letters.

I would be grateful if anyone could take a time out of his/her busy schedule and suggest me the best solution.

Thanks in advance.

sld · Posted 12-15-2017 12:49 AM

Your model in your second post specifies a 3-way factorial in a randomized block design, but in your first post you described your design as a split-plot with (I think, I'm not entirely sure) whole plots in blocks. So this second model specification does not appear to match the design. Your model in your first post corresponds (I think) to your actual design.

You can do a lot of exploration and testing of lsmeans within the LSMEANS statement in either the MIXED or GLIMMIX procedure. I've illustrated some of these options in my code below; refer to the documentation for more details about what these options do. Using the DIFF option on the LSMEANS statement does not get you very far in interpretation of interactions; as I noted before, a thoughtful, context-specific approach is better. I've used GLIMMIX rather than MIXED to get interaction plots and to use SLICEDIFF, but otherwise the LSMEANS options I've used are available in MIXED.

data test;
    input Rep Tillage$ Crop_Residue$ N_Dose$ SOC;
    datalines;
1	Contour	with	N0PK	30256.7
2	Contour	with	N0PK	16870.24
3	Contour	with	N0PK	26706.44
4	Contour	with	N0PK	23508.73
1	Contour	with	N120PK	24466.53
2	Contour	with	N120PK	24032.51
3	Contour	with	N120PK	26145.86
4	Contour	with	N120PK	23643.35
1	Contour	with	N60PK	27335.54
2	Contour	with	N60PK	23438.42
3	Contour	with	N60PK	27678.23
4	Contour	with	N60PK	26230.67
1	Contour	without	N0PK	23386.71
2	Contour	without	N0PK	18918.63
3	Contour	without	N0PK	27250.83
4	Contour	without	N0PK	23500.25
1	Contour	without	N120PK	24153.19
2	Contour	without	N120PK	23274.78
3	Contour	without	N120PK	23882.67
4	Contour	without	N120PK	21778.33
1	Contour	without	N60PK	27734.6
2	Contour	without	N60PK	19916.57
3	Contour	without	N60PK	25009.3
4	Contour	without	N60PK	22351.28
1	Reduced	with	N0PK	23766.65
2	Reduced	with	N0PK	30368.58
3	Reduced	with	N0PK	29149.02
4	Reduced	with	N0PK	27259.67
1	Reduced	with	N120PK	28545.35
2	Reduced	with	N120PK	28205.5
3	Reduced	with	N120PK	29953.75
4	Reduced	with	N120PK	28287.69
1	Reduced	with	N60PK	32671.92
2	Reduced	with	N60PK	25906.4
3	Reduced	with	N60PK	25047.6
4	Reduced	with	N60PK	27056.77
1	Reduced	without	N0PK	25677.25
2	Reduced	without	N0PK	23565.46
3	Reduced	without	N0PK	26469.71
4	Reduced	without	N0PK	28219.43
1	Reduced	without	N120PK	29238.49
2	Reduced	without	N120PK	24391.22
3	Reduced	without	N120PK	28182.27
4	Reduced	without	N120PK	27081.03
1	Reduced	without	N60PK	28364.79
2	Reduced	without	N60PK	27046.47
3	Reduced	without	N60PK	25500.1
4	Reduced	without	N60PK	24558.67
run;

/*  Rename n_dose levels to be quantitative */
data test;
    set test;
    if n_dose = "N0PK" then n_level = 0;
    else if n_dose = "N120PK" then n_level = 120;
    else if n_dose = "N60PK" then n_level = 60;
    run;

/* Plot observed data, for visual assessment of various things */
proc sort data=test out=test_out;
    by n_level;
proc sgpanel data=test_out;
    panelby tillage crop_residue / columns=2;
    series x=n_level y=soc / group=rep markers;
    run;

/*  ASSUMING a 3-way full factorial treatment structure
    in a split-plot design with whole plots in blocks */
proc glimmix data=test
    plots=(studentpanel boxplot(fixed));
    class rep tillage crop_residue n_dose;
    model soc = tillage | crop_residue | n_dose;
    random  rep             /* block variance */
            rep*tillage ;   /* whole plot variance */
                            /* residual is subplot variance */
    lsmeans tillage | crop_residue | n_dose;
    /*  PRETEND that tillage*n_dose interaction is significant */
    lsmeans tillage*n_dose / plot=meanplot(sliceby=tillage join cl) /* interaction plot, not available in Proc MIXED */
        slice=(tillage n_dose) /* simple effects tests */
        slicediff=tillage adjust=tukey; /* pairwise comparisons among n_dose means adjusted for Type I error within each level of tillage */
    run;

I hope this is of help.

View solution in original post

sld · Posted 12-06-2017 11:06 PM

I would think of each subplot being divided into two sub-subplots, with each level of depth being "randomly" assigned to a sub-subplot. The model now is a split-split plot design:

proc mixed;
  class  tillage CR N replication depth ;
  model SOC=  tillage | CR | N | depth ;
  random  replication(tillage) cr*n*replication(tillage);
  lsmeans tillage CR  N / alpha=0.01 diff adjust=simulate ;
  lsmeans tillage | cr | n | depth;
  run;

Note that I removed "replication" from RANDOM because based on your description, the whole plots are not clustered in blocks. "replication(tillage)" is the whole plot variance. "cr*n*replication(tillage)" is the subplot variance. The sub-subplot variance is residual.

I do not recommend applying Type I error adjustment (like the simulate method) to interaction means. (I am fine with it for main effects.) The adjustment will account for ALL pairwise comparisons among the interaction means, most of which you have no interest in. Adjusting for Type I error increases Type II error (and so, reduces power) and to control for more comparisons than necessary is an unnecessary reduction in power--unless you care only about the risk of Type I error and you care nothing about the risk of Type II error. For most of us, I seriously doubt that is the case.

In addition, interaction generally is more complicated than mere pairwise comparisons. I think you have to take a more thoughtful and context-specific approach to interpretation of interaction. My usual approach is to identify post-hoc contrasts that are pertinent in the context of the study and apply Type I error control to those. But that's my opinion, and yours may differ.

Hope this helps.

Edit: It could be that "location" is a block, and that "location" is the same as "replication", in which would require a different RANDOM statement. The design is not clear from your description.

eshad002 · Posted 12-14-2017 06:29 PM

Hi sld,

Firstly, I would like to express my sincere gratitude for your kind response. I just used your suggested codes and I realized that the interactions are getting quite complicated and even more tough while interpreting the result. I, therefore, decided to opt out the "depth" variable and consider the value for the total depth. Then I thought how I could fit the "random" effect in this case. For clarification, I hereby attach part of my dataframe.

Rep	Tillage	Crop_Residue	N_Dose	SOC
1	Contour	with	N0PK	30256.7
2	Contour	with	N0PK	16870.24
3	Contour	with	N0PK	26706.44
4	Contour	with	N0PK	23508.73
1	Contour	with	N120PK	24466.53
2	Contour	with	N120PK	24032.51
3	Contour	with	N120PK	26145.86
4	Contour	with	N120PK	23643.35
1	Contour	with	N60PK	27335.54
2	Contour	with	N60PK	23438.42
3	Contour	with	N60PK	27678.23
4	Contour	with	N60PK	26230.67
1	Contour	without	N0PK	23386.71
2	Contour	without	N0PK	18918.63
3	Contour	without	N0PK	27250.83
4	Contour	without	N0PK	23500.25
1	Contour	without	N120PK	24153.19
2	Contour	without	N120PK	23274.78
3	Contour	without	N120PK	23882.67
4	Contour	without	N120PK	21778.33
1	Contour	without	N60PK	27734.6
2	Contour	without	N60PK	19916.57
3	Contour	without	N60PK	25009.3
4	Contour	without	N60PK	22351.28
1	Reduced	with	N0PK	23766.65
2	Reduced	with	N0PK	30368.58
3	Reduced	with	N0PK	29149.02
4	Reduced	with	N0PK	27259.67
1	Reduced	with	N120PK	28545.35
2	Reduced	with	N120PK	28205.5
3	Reduced	with	N120PK	29953.75
4	Reduced	with	N120PK	28287.69
1	Reduced	with	N60PK	32671.92
2	Reduced	with	N60PK	25906.4
3	Reduced	with	N60PK	25047.6
4	Reduced	with	N60PK	27056.77
1	Reduced	without	N0PK	25677.25
2	Reduced	without	N0PK	23565.46
3	Reduced	without	N0PK	26469.71
4	Reduced	without	N0PK	28219.43
1	Reduced	without	N120PK	29238.49
2	Reduced	without	N120PK	24391.22
3	Reduced	without	N120PK	28182.27
4	Reduced	without	N120PK	27081.03
1	Reduced	without	N60PK	28364.79
2	Reduced	without	N60PK	27046.47
3	Reduced	without	N60PK	25500.1
4	Reduced	without	N60PK	24558.67

I used the following codes to make the analysis. I request you kindly to have a look at my approach. I look forward to getting your valuable suggestion.

PROC MIXED;
CLASS   Tillage Crop_Residue N_Dose Rep;
MODEL   SOC = Tillage Crop_Residue N_Dose
 	Tillage*Crop_Residue  Tillage*N_Dose  Crop_Residue*N_Dose Tillage*Crop_Residue*N_Dose   / DDFM=SATTERTH;
RANDOM  Rep;
LSMEANS    Tillage  Crop_Residue  N_Dose Tillage*Crop_Residue  Tillage*N_Dose  Crop_Residue*N_Dose Tillage*Crop_Residue*N_Dose /DIFF;
PARMS / NOBOUND;
RUN;

sld · Posted 12-15-2017 12:49 AM

Your model in your second post specifies a 3-way factorial in a randomized block design, but in your first post you described your design as a split-plot with (I think, I'm not entirely sure) whole plots in blocks. So this second model specification does not appear to match the design. Your model in your first post corresponds (I think) to your actual design.

You can do a lot of exploration and testing of lsmeans within the LSMEANS statement in either the MIXED or GLIMMIX procedure. I've illustrated some of these options in my code below; refer to the documentation for more details about what these options do. Using the DIFF option on the LSMEANS statement does not get you very far in interpretation of interactions; as I noted before, a thoughtful, context-specific approach is better. I've used GLIMMIX rather than MIXED to get interaction plots and to use SLICEDIFF, but otherwise the LSMEANS options I've used are available in MIXED.

data test;
    input Rep Tillage$ Crop_Residue$ N_Dose$ SOC;
    datalines;
1	Contour	with	N0PK	30256.7
2	Contour	with	N0PK	16870.24
3	Contour	with	N0PK	26706.44
4	Contour	with	N0PK	23508.73
1	Contour	with	N120PK	24466.53
2	Contour	with	N120PK	24032.51
3	Contour	with	N120PK	26145.86
4	Contour	with	N120PK	23643.35
1	Contour	with	N60PK	27335.54
2	Contour	with	N60PK	23438.42
3	Contour	with	N60PK	27678.23
4	Contour	with	N60PK	26230.67
1	Contour	without	N0PK	23386.71
2	Contour	without	N0PK	18918.63
3	Contour	without	N0PK	27250.83
4	Contour	without	N0PK	23500.25
1	Contour	without	N120PK	24153.19
2	Contour	without	N120PK	23274.78
3	Contour	without	N120PK	23882.67
4	Contour	without	N120PK	21778.33
1	Contour	without	N60PK	27734.6
2	Contour	without	N60PK	19916.57
3	Contour	without	N60PK	25009.3
4	Contour	without	N60PK	22351.28
1	Reduced	with	N0PK	23766.65
2	Reduced	with	N0PK	30368.58
3	Reduced	with	N0PK	29149.02
4	Reduced	with	N0PK	27259.67
1	Reduced	with	N120PK	28545.35
2	Reduced	with	N120PK	28205.5
3	Reduced	with	N120PK	29953.75
4	Reduced	with	N120PK	28287.69
1	Reduced	with	N60PK	32671.92
2	Reduced	with	N60PK	25906.4
3	Reduced	with	N60PK	25047.6
4	Reduced	with	N60PK	27056.77
1	Reduced	without	N0PK	25677.25
2	Reduced	without	N0PK	23565.46
3	Reduced	without	N0PK	26469.71
4	Reduced	without	N0PK	28219.43
1	Reduced	without	N120PK	29238.49
2	Reduced	without	N120PK	24391.22
3	Reduced	without	N120PK	28182.27
4	Reduced	without	N120PK	27081.03
1	Reduced	without	N60PK	28364.79
2	Reduced	without	N60PK	27046.47
3	Reduced	without	N60PK	25500.1
4	Reduced	without	N60PK	24558.67
run;

/*  Rename n_dose levels to be quantitative */
data test;
    set test;
    if n_dose = "N0PK" then n_level = 0;
    else if n_dose = "N120PK" then n_level = 120;
    else if n_dose = "N60PK" then n_level = 60;
    run;

/* Plot observed data, for visual assessment of various things */
proc sort data=test out=test_out;
    by n_level;
proc sgpanel data=test_out;
    panelby tillage crop_residue / columns=2;
    series x=n_level y=soc / group=rep markers;
    run;

/*  ASSUMING a 3-way full factorial treatment structure
    in a split-plot design with whole plots in blocks */
proc glimmix data=test
    plots=(studentpanel boxplot(fixed));
    class rep tillage crop_residue n_dose;
    model soc = tillage | crop_residue | n_dose;
    random  rep             /* block variance */
            rep*tillage ;   /* whole plot variance */
                            /* residual is subplot variance */
    lsmeans tillage | crop_residue | n_dose;
    /*  PRETEND that tillage*n_dose interaction is significant */
    lsmeans tillage*n_dose / plot=meanplot(sliceby=tillage join cl) /* interaction plot, not available in Proc MIXED */
        slice=(tillage n_dose) /* simple effects tests */
        slicediff=tillage adjust=tukey; /* pairwise comparisons among n_dose means adjusted for Type I error within each level of tillage */
    run;

I hope this is of help.

Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

Re: Proc Mixed model and multigroup comparison

SAS Innovate 2025: Save the Date