Solved: Re: Split-plot design at random multiple locations

xuelin0820 · Posted 01-03-2019 04:42 PM

Dear all,

I have a general question about using PROC MIXED for split-plot design at random multiple locations. Suppose A-main plot, B-sub-plot, Rep-replications, Loc-locations. If Loc is fixed, I know the code should be

proc mixed;
class Loc Rep A B;
model y =Loc|A|B;
random Rep(Loc)  A*Rep(Loc);
run;

However, if Loc is a random effect, then how should I revise the code? Should I also include Loc*A Loc*B Loc*A*B in random statement? What I think is

proc mixed;
class Loc Rep A B;
model y = A B A*B;
random Loc Rep(Loc) A*Rep(Loc);

Please correct me, thanks!!

Further more, I know mathematically Loc + Rep(Loc) is equivalent to Loc + Rep + Loc*Rep in SAS, but which one is the correct way to understand the logic of experimental design? Suppose Loc is crossed with Rep, then A*Rep(Loc) should be changed to A*Rep + A*Loc + A*Rep* Loc ? Are they equivalent?

Thanks for any input!!

sld · Posted 01-04-2019 02:53 PM

I agree that the random statements could be

  random loc
    a*loc
    a*b*loc;
  random rep(loc)
    a*rep(loc);

and are generally/probably what I would use unless, as you note, I have a strip-plot element in the design or perhaps a lack of random assignment of treatment to experimental units. Still, most of the designs I work with are small samples, and estimating fewer variance components is nearly always the better route!

View solution in original post

sld · Posted 01-03-2019 07:08 PM

I am assuming here that Locs are random, that Reps are random and nested within Locs, that WholePlots are random and nested within Reps, and that SubPlots are random and nested within WholePlots. The experimental unit for fixed effects factor A is WholePlot, and the experimental unit for fixed effects factor B is SubPlot.

If Loc is random, then generally you would think of the spatial inference space as being defined by Locs. Consequently, Reps within Locs are subsamples. I would consider something like:

proc mixed;
  class loc rep a b;
  model y = a b a*b;
  random loc
    a*loc
    b*loc
    a*b*loc;
random rep(loc)
  a*rep(loc)
  b*rep(loc);

This leaves a*b*rep(loc) to be residual variance.

You could combine b*loc + a*b*loc by replacing the first RANDOM statement with

  random loc
    a*loc
    a*b*loc;

If the second RANDOM statement generated estimation problems, you could omit it, then residual variance would be rep(loc) + a*rep(loc) + b*rep(loc) + a*b*rep(loc).

If the experimental design is like I described above, then I think Rep(Loc) or Rep*Loc makes more sense than Rep + Loc*Rep. If each Rep is not uniquely identified across all Locs, then you must specify either Rep(Loc) or Rep*Loc. I prefer Rep(Loc) because it explicitly implies nesting of Rep units within Loc units.

I hope this helps.

xuelin0820 · Posted 01-04-2019 08:45 AM

Hi sld,

Thanks very much for your well organized reply! However, based on your first part of code, I couldn't see the difference between A and B, it looks like a strip-plot design, and I also hesitate to agree to the subsamples opinion about Reps within Locs., because different reps have different experimental units. Within each location, Reps are independent with each other, just like subplots within each main plot are independent. Otherwise, it is no sense to test A*B interaction.

Thank you for the explanation about Rep(Loc), I agree with you that Rep(Loc) makes more sense. I also want to know when using Rep*Loc , how to modify the random statement.

sld · Posted 01-04-2019 01:02 PM

I'm not sure what you mean by "I couldn't see the difference between A and B". Could it be something to do with including b*loc in the first RANDOM statement? I usually use the second version that pools b*loc with a*b*loc.

Regarding whether Reps are subsamples with Loc or not: See Section 6.6 (A Multilocation Example) in Littell et al., SAS for Mixed Models, 2nd ed. This example lays out the choices that must be made, with some suggested guidelines. Their general recommendation is that the location x treatment term should be retained in the model if you cannot comfortably assume that treatment effects are the same at all locations. At the time of publication (2006), they note that some assumptions depend on the specifics of the study and "are considered controversial by many statisticians". So there possibly is room for different approaches, dependent on study context.

I don't know whether this information is in the newly-released 3rd ed: SAS® for Mixed Models: Introduction and Basic Applications.

Syntax-wise, Rep*Loc expands like Rep(Loc): each covers Loc, Rep, and Rep*Loc, not including components that are otherwise specified in the model. So the two forms are interchangeable in my experience. This documentation link refers to the design matrix for nested effects in the MODEL statement, but construction of the design matrix for the RANDOM statement is largely identical.

xuelin0820 · Posted 01-04-2019 01:44 PM

Thank you sld for the useful references, they are much appreciated, and I will read them carefully.

Sorry for the confusion of my words "I couldn't see the difference between A and B". What I mean is that the main plot factor A and sub-plot factor B are parallel or symmetric in your code, I couldn't see which one is the main plot and which one is the sub-plot based on the two random statements. I think the second random statement should be

random rep(loc)  a*rep(loc);

Suppose it is a split-plot RCB, b*rep(loc) should not be there because split-plot design assume Reps are independent to the sub-plot. If my understanding is wrong, please correct me, thank you!

sld · Posted 01-04-2019 02:53 PM

I agree that the random statements could be

  random loc
    a*loc
    a*b*loc;
  random rep(loc)
    a*rep(loc);

and are generally/probably what I would use unless, as you note, I have a strip-plot element in the design or perhaps a lack of random assignment of treatment to experimental units. Still, most of the designs I work with are small samples, and estimating fewer variance components is nearly always the better route!

xuelin0820 · Posted 01-04-2019 03:28 PM

I'm working with agricultural science, and my samples are usually small too. Can't agree with you more that "estimating fewer variance components is nearly always the better route!" , I usually don't include a*b*loc. Thank you for all of your help!

sld · Posted 01-04-2019 04:10 PM

You're welcome.

This is a nice paper, too, if you haven't run across it yet On recognizing the proper experimental unit in animal studies in the dairy sciences.

Catch up on SAS Innovate 2026