BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xuelin0820
Fluorite | Level 6

Dear all,

 

I have a general question about using PROC MIXED for split-plot design at random multiple locations. Suppose A-main plot, B-sub-plot, Rep-replications, Loc-locations. If Loc is fixed, I know the code should be

proc mixed;
class Loc Rep A B;
model y =Loc|A|B;
random Rep(Loc)  A*Rep(Loc);
run;

However, if Loc is a random effect, then how should I revise the code? Should I also include Loc*A Loc*B Loc*A*B in random statement? What I think is 

proc mixed;
class Loc Rep A B;
model y = A B A*B;
random Loc Rep(Loc) A*Rep(Loc);

Please correct me, thanks!!

Further more, I know mathematically Loc + Rep(Loc) is equivalent to Loc + Rep + Loc*Rep in SAS, but which one is the correct way to understand the logic of experimental design? Suppose Loc is crossed with Rep, then A*Rep(Loc) should be changed to A*Rep + A*Loc + A*Rep* Loc ? Are they equivalent?

Thanks for any input!!

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I agree that the random statements could be

 

  random loc
    a*loc
    a*b*loc;
  random rep(loc)
    a*rep(loc);

and are generally/probably what I would use unless, as you note, I have a strip-plot element in the design or perhaps a lack of random assignment of treatment to experimental units. Still, most of the designs I work with are small samples, and estimating fewer variance components is nearly always the better route!

 

 

View solution in original post

7 REPLIES 7
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I am assuming here that Locs are random, that Reps are random and nested within Locs, that WholePlots are random and nested within Reps, and that SubPlots are random and nested within WholePlots. The experimental unit for fixed effects factor A is WholePlot, and the experimental unit for fixed effects factor B is SubPlot.

 

If Loc is random, then generally you would think of the spatial inference space as being defined by Locs. Consequently, Reps within Locs are subsamples. I would consider something like:

 

 

proc mixed;
  class loc rep a b;
  model y = a b a*b;
  random loc
    a*loc
    b*loc
    a*b*loc;
random rep(loc)
  a*rep(loc)
  b*rep(loc);

 

This leaves a*b*rep(loc) to be residual variance.

 

You could combine b*loc + a*b*loc by replacing the first RANDOM statement with

 

 

  random loc
    a*loc
    a*b*loc;

 

If the second RANDOM statement generated estimation problems, you could omit it, then residual variance would be rep(loc) + a*rep(loc) + b*rep(loc) + a*b*rep(loc).

 

If the experimental design is like I described above, then I think Rep(Loc) or Rep*Loc makes more sense than Rep + Loc*Rep. If each Rep is not uniquely identified across all Locs, then you must specify either Rep(Loc) or Rep*Loc. I prefer Rep(Loc) because it explicitly implies nesting of Rep units within Loc units.

 

I hope this helps.

 

xuelin0820
Fluorite | Level 6

Hi sld,

 

Thanks very much for your well organized reply! However, based on your first part of code, I couldn't see the difference between A and B, it looks like a strip-plot design, and I also hesitate to agree to the subsamples opinion about Reps within Locs., because different reps have different experimental units. Within each location, Reps are independent with each other, just like subplots within each main plot are independent. Otherwise, it is no sense to test A*B interaction. 

 

Thank you for the explanation about Rep(Loc), I agree with you that Rep(Loc) makes more sense. I also want to know when using Rep*Loc , how to modify the random statement. 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I'm not sure what you mean by "I couldn't see the difference between A and B". Could it be something to do with including b*loc in the first RANDOM statement? I usually use the second version that pools b*loc with a*b*loc

 

Regarding whether Reps are subsamples with Loc or not: See Section 6.6 (A Multilocation Example) in Littell et al., SAS for Mixed Models, 2nd ed. This example lays out the choices that must be made, with some suggested guidelines. Their general recommendation is that the location x treatment term should be retained in the model if you cannot comfortably assume that treatment effects are the same at all locations. At the time of publication (2006), they note that some assumptions depend on the specifics of the study and "are considered controversial by many statisticians". So there possibly is room for different approaches, dependent on study context.

 

I don't know whether this information is in the newly-released 3rd ed: SAS® for Mixed Models: Introduction and Basic Applications

 

Syntax-wise, Rep*Loc expands like Rep(Loc): each covers Loc, Rep, and Rep*Loc, not including components that are otherwise specified in the model. So the two forms are interchangeable in my experience. This documentation link refers to the design matrix for nested effects in the MODEL statement, but construction of the design matrix for the RANDOM statement is largely identical.

xuelin0820
Fluorite | Level 6

Thank you sld for the useful references, they are much appreciated, and I will read them carefully.

 

Sorry for the confusion of my words "I couldn't see the difference between A and B". What I mean is that the main plot factor A and sub-plot factor B are parallel or symmetric in your code, I couldn't see which one is the main plot and which one is the sub-plot based on the two random statements. I think the second random statement should be 

random rep(loc)  a*rep(loc);

Suppose it is a split-plot RCB, b*rep(loc) should not be there because split-plot design assume Reps are independent to the sub-plot. If my understanding is wrong, please correct me, thank you! 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I agree that the random statements could be

 

  random loc
    a*loc
    a*b*loc;
  random rep(loc)
    a*rep(loc);

and are generally/probably what I would use unless, as you note, I have a strip-plot element in the design or perhaps a lack of random assignment of treatment to experimental units. Still, most of the designs I work with are small samples, and estimating fewer variance components is nearly always the better route!

 

 

xuelin0820
Fluorite | Level 6
I'm working with agricultural science, and my samples are usually small too. Can't agree with you more that "estimating fewer variance components is nearly always the better route!" , I usually don't include a*b*loc. Thank you for all of your help!
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You're welcome.

 

This is a nice paper, too, if you haven't run across it yet  On recognizing the proper experimental unit in animal studies in the dairy sciences

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2095 views
  • 1 like
  • 2 in conversation