BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fridge_wpg
Obsidian | Level 7

Hi,

I am using a hand-held moisture meter to monitor loss of moisture during different stages of growth in corn cobs that have a gradient of moisture. There is data of two years related to eight treatments (Trt), four replications taken weekly for nine weeks.

Currently, I just need to test whether the meter detects the moisture differences in treatments? Can someone, please review the attached program and suggest if my approach is correct?

Lastly, I have used proc Glimmix to get the lsmeans of Trt*time and have requested a default plot. Is there any way to customize the graph for space between lines (trt) and to select types of lines, please? I will appreciate help with codes, please. Thank you.

 

Fridge

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I may (or may not) have steered you to the correct model, but rather than using "personal communication" as justification, I recommend studying the resources I suggested to obtain an adequate understanding of the statistical model for yourself. If you don't understand it, you'll have trouble defending it if called upon to do so.

 

And then you'll be in a better position to design and analyze your next study!

 

 

View solution in original post

17 REPLIES 17
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If the response is normally distributed, then it is possible to specify the same model in MIXED and GLIMMIX. 

 

In the code you provided, you specify two different statistical models: the structures of the fixed effects components are the same, but the structures of the random effects components are different. And "subject=trt" is wrong. These issues suggest that you may not have an adequate understanding of mixed models, your experimental design, and how to specify it as a mixed model. I recommend studying SAS® for Mixed Models, Second Edition, and then following up with any questions.

fridge_wpg
Obsidian | Level 7

Hi,

 

Thank you very much for your input on the question and I agree with your assessment that I do not have an adequate understanding of the subject in question. I have the book now " SAS for Mixed Models" but would like to pick your brain if I may please.  In the statistical model given below, you mentioned in your post that the subject=Trt is wrong! (but it gives me the expected results) Can you please explain that a little? In the experiment, the treatment means difference over time is being investigated. Now if I change the subject=Trt (time) then I get a non-sig result for trt*time that is against my observation and I have trouble understanding at the moment. 

 

(The data with the program code is attached)

 

proc mixed data=drydown_weekly_avg;
class year rep trt time;
model Moist = trt|time / ddfm=kr;
random year (rep);
repeated /subject= Trt type=AR (1);
lsmeans Trt*Time/slice=time;
run;
Quit;

 

Thank you very much for your time on this

 

Fridge

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The "subject" is the design unit on which repeated measurements are made. It should be a random effects factor, which TRT is not (TRT is a fixed effects factor). 

 

I don't have a clear sense of your experimental design, so I'm not going to speculate what the appropriate subject might be. The appropriate subject will involve REP, but YEAR might play a role as well, and you have not clearly identified the role of YEAR in your study. Is YEAR a fixed effects (and so affects the mean of the response) or is it a random effect (and so affects the variance of the response)?

 

The appropriate subject specification also depends upon how factors--notably REP, in your case--are coded. Keep in mind that you are providing information to the PROC: you are identifying a random effects factor and its levels. When reps are coded the same for all levels of TRT (as in your data set), you need to specify more than just REP to tell the procedure that you have different REPs for different levels of TRT: think REP(TRT). 

 

fridge_wpg
Obsidian | Level 7

Hi,

 

I cannot thank you enough for your input on my posts. It has helped me understand many issues of repeated measure analysis to which I was blindfolded. Although, I read the relevant part of the book "SAS for Mixed Models" that you recommend, however, your post was more beneficial. 

 

My experimental design was RCBD containing 4 blocks/rep, 8 treatments, time (repeated measurements on treatments weekly for 9 weeks). The experiment was done for two years.

 

The objective of the experiment was to test whether the device used to take the moisture measurements on the 8 treatments can detect the differences in treatments and across time?

 

Generally, in agricultural experiments, the rep and year differences are of not much interest and therefore, considered random effects.

 

Considering your suggestion when I modified the program code as below:

proc mixed data=weekly_moisture;

class year rep trt time;

model Moist = trt|time|year/ ddfm=kr;

random rep; *Note I have not nested the rep in year as it gives infinite tvalue in the LSMEAN tables;

repeated /subject = Rep (Trt) type= AR (1);

lsmeans Trt*Time/slice=time;

run;

Quit;

 

Changing the subj=Rep (Trt) and adding year as the fixed effect decreases the fit statistics (AIC, AICC, and BIC) substantially.

I will appreciate any comment you may have. Thank you again.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You said you were measuring corn cobs. But to set up a correct model, I need to know what comprises a block, and what comprises a rep. I also need to know whether you have different blocks and reps in the two years: was the same rep followed across two years, or did you have new reps in the second year?

 

In other words, I need a more complete description of your experimental protocol.

 

I'd say you are understating the importance of random effects. Replications are random effects, and they are critically important in defining the inference space of your study. Year can be random or fixed; in my opinion, with a small number of years (e.g., 2) in field experiments you are better off thinking of year as fixed, not random. It's an arguable point, though.

 

fridge_wpg
Obsidian | Level 7

Design

Treatment

Replication

# of years the exp. was repeated

# of weekly measurements

RCBD

8

4

2

9

What was measured in treatments?

I measured the moisture content of 10 corn cobs in each treatment with a handheld electronic device. If I find a significant difference in treatment means across time, I will interpret that the meter is effective in differentiating the moisture content.

Replications (Rep)

Replication is a mere repetition of the experiment, done four-time comprising of the same all treatments in every rep but obviously randomized within the rep. I referred reps as blocks previously, which is actually not correct. So no blocks, only reps.

Years

Followed the same experimental protocol and same exact order of replications and treatments every year.

 

That is how the model looks like now after input from SID (thank you)

proc mixed data _;

class year rep trt time;

model Moist = trt|time|year / ddfm=kr;

random year(rep);

repeated /subject= Rep (Trt) type= AR (1);

lsmeans Trt*Time/slice=time;

run;Quit;

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The information I'm seeking is what you would provide in a Methods section of a manuscript. (You have to write it eventually, you might as well do it now.) My rule of thumb is that the Methods description should allow a reader to duplicate your study. What you have provided is not yet enough, so more questions :

 

This a field study or a lab study?

What are your TRTs?

What is the experimental unit to which a level of TRT is randomly assigned?

What is a REP? I want to know how a REP relates to a corn cob, or clusters of corn cobs. Is a REP the experimental unit to which a level of TRT is assigned? Or is it a block of experimental units?

Is each value of MOIST measured on one corn cob, or is it the mean of 10 corn cobs? I'm trying to figure out where "10 corn cobs" works in here. 

Is MOIST a percentage?

What is TIME (e.g., days?), and what are the levels of TIME? Are the levels equally spaced?

Do you measure MOIST on the same corn cob nine times, or do you use a different corn cob each time?

Do you use different corn cobs in different years? Do you use different REPs (whatever they are) in different years? Do you use different experimental units for TRTs in different years?

Within each year, are all four REPs run simultaneously or do you run one REP to completion, then the second, then third, then fourth?

Within each REP, are all eight TRTs run simultaneously or sequentially?

 

Your MOIST values are very similar among REPs (i.e., there is very little variability), apart from some notably odd values which might be typos or outliers. In my experience, agricultural data are typically quite noisy, so your data are uncommon in this regard.

 

fridge_wpg
Obsidian | Level 7

Hi Sid,

 

Sorry for a delayed response. I couldn't work in the whole previous week; was sick. I am eager to see how the information provided would affect the model developed:

 

Q: This a field study or a lab study?

A: It was a field study.

Q: What are your TRTs?

A: The treatments are four different corn hybrids belonging to distinct maturity groups. The same set of four hybrids were seeded at two different times (4x2=8 treatments, 1 to 4 (early seeding & 5 to 8 (late seeding). The identity of Treatment 1 to 8 remains unchanged throughout the experiment.

Q: What is the experimental unit to which a level of TRT is randomly assigned?

A: Corn cob (s) in a treatment is the experimental unit which was measured for its moisture content repeatedly.

Q: What is a REP? I want to know how a REP relates to a corn cob or clusters of corn cobs. Is a REP the experimental unit to which a level of TRT is assigned? Or is it a block of experimental units?

A replication contains a complete set of all levels of treatments (1 to 8). Replication represents one complete experiment and it was replicated four times in both years. So replication is not a block in my experiment.

Q: Is each value of MOIST measured on one corn cob, or is it the mean of 10 corn cobs? I'm trying to figure out where "10 corn cobs" works in here. 

A: Yes, Moist, as given in the attached text file, is an average of 10 readings of a treatment.

Q: Is MOIST a percentage?

A: Yes, it is.

Q: What is TIME (e.g., days?), and what are the levels of TIME? Are the levels equally spaced?

A: Time is a week and equally spaced after two weeks of silking.

Q: Do you measure MOIST on the same corn cob nine times, or do you use a different corn cob each time?

A: Yes, 10 cobs were randomly selected and then marked.  Every week/time the same tagged cob was measured for moisture content.

Q: Do you use different corn cobs in different years? Do you use different REPs (whatever they are) in different years? Do you use different experimental units for TRTs in different years?

A: Yes, different cobs in both years. Because you have to seed a new crop but the same genetics (hybrids) and the procedure as of the previous year were followed. The identity of treatments and replications remained unchanged. Yes, different experimental units (cobs) for treatments in both years.

Q: Within each year, are all four REPs run simultaneously or do you run one REP to completion, then the second, then third, then fourth?

A: Within each year all four reps run simultaneously. Every week all reps data was collected within a day (three hours).

Q: Within each REP, are all eight TRTs run simultaneously or sequentially?

A: Within Reps, all treatments were random, not sequential.  

Your MOIST values are very similar among REPs (i.e., there is very little variability), apart from some notably odd values which might be typos or outliers. In my experience, agricultural data are typically quite noisy, so your data are uncommon in this regard.

Your observation is correct and it is because of the device used to measure the moisture content. A care was taken to make the measurement at a time of the day that is not affecting the measurement. As the device measures the wettest spot (fog and dew would influence the measurement).

 

The model for analysis is:

 

proc mixed data=weekly_moisture;
class year rep trt time;
model Moist = trt|time|year/ ddfm=kr;
random rep;*Note I have not nested the rep in year as it gives infinite tvalue in the LSMEAN tables;
repeated /subject = Rep (Trt) type= AR (1);
lsmeans Trt*Time/slice=time;
run;
Quit;

 

Thank you very much for your input

 

fridge

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Thank you for the additional information. I hope you are feeling better!

 

Because this is a field experiment and because the treatment is a combination of corn hybrid and seeding date, the experimental unit almost surely is not a corn cob; rather it is probably a plot or a site or a field--in other words, some spatial unit that was planted with a particular hybrid at a particular time: each experimental unit is a set of corn plants associated with one level of treatment. How are these experimental units (plots? sites? fields?) arranged in space? Are they clustered into sets (i.e., blocks) or is each EU independent of every other EU? You seem to have an inadequate understanding of the principles of experimental design. My favorite reference on this topic is  Statistical Principles for the Design of Experiments. Yes, it's an entire book, but experimental design is a big topic.

 

Please describe the physical layout of your experiment in space with respect to treatments in each year.

 

I do not think your model is correct yet.

 

 

fridge_wpg
Obsidian | Level 7

Hi SID,

 

I do feel a lot better, thank you.

 

I swayed back and forth on corn cobs or the treatments to be the experimental unit (EU) of the experiment but never thought about the sites to be the EU. Thank you for correcting me and will explore the reference you provided.

 

The site or a plot that contained four rows of each treatment was immediately (76 cm) followed by another plot containing one of the other eight treatments (early, early or late, early etc seeding pattern, randomized).

 

As mentioned earlier, the treatment identity representing hybrid and seeding time remained unchanged in both years (T1 to T4, early seeded coupled with the hybrid types and T5 to T8, Late seeding). The site that contained all the plots (treatments) has earlier been referred as replication, although it served as a block to contain soil heterogeneity if there was any.

 

All four replications were spatially separated from the adjacent replication by one meter (standard field protocol). 

 

The site of the experiment varied by 200 meters in both years.

   

Yes, the model perhaps would look like as given below when the treatments are separated into seeding time and hybrids:

 

proc mixed data=xy;

class year rep seeding hybrid time;

model Moist = year|seeding|hybrid|time/ ddfm=kr;

random rep;

repeated /subject= Rep (hybrid) type= AR (1);

lsmeans hybrid|seeding|time/pdiff adjust=tukey;

 

(I am not exploring the year’s effect)

 

Thank you very much for your time

 

Fridge

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

As you learn more about the basics of experimental design, keep in mind that a fixed effect (like TRT) cannot also be an experimental unit. EUs are random effects factors. Levels of treatment are randomly assigned to experimental units; to have replication, you must have more experimental units (i.e., more levels of the random effects factor) than treatment levels. In my experience with clients, I've found that this distinction is critical to understanding design.

 

Other excellent resources are

SAS for Mixed Models, 2nd ed

Generalized Linear Mixed Models: Modern Concepts, Methods and Applications particularly Ch 2, 7, and 8

 

If I understand your design correctly, this is how I envision it:

 

REP is the random experimental unit for the fixed effect of YEAR. There are 4 REPs in 2015 and 4 different REPs in 2016.

Plots within REPs are random experimental units that are  randomly assigned to 8 treatments, which are comprised of a 4 x 2 factorial (hybrid x seeding).

10 corncobs are randomly chosen from each plot. The same 10 corncobs are measured at each of nine times. The values for the 10 corncobs are averaged for each TIME.

 

I would consider this analysis:

 

/*  Check pattern of observations */
proc tabulate data=weekly_moisture;
    class year rep trt time;
    table trt*time, year*rep;
    run;

/*  Visual assessment of raw data */
proc sort data=weekly_moisture;
    by year rep trt time;
proc sgpanel data=weekly_moisture;
    panelby year trt / columns=4;
    series x=time y=moist / group=rep markers;
    run;
proc sgpanel data=weekly_moisture;
    panelby year rep / columns=4;
    series x=time y=moist / group=trt markers;
    run;

/*  Add factors to dataset */
data weekly_moisture;
    set weekly_moisture;
    if trt= 1 then do; hybrid= "H1"; seeding= "Early"; end; 
    else if trt= 2 then do; hybrid= "H2"; seeding= "Early"; end;
    else if trt= 3 then do; hybrid= "H3"; seeding= "Early"; end; 
    else if trt= 4 then do; hybrid= "H4"; seeding= "Early"; end; 
    else if trt= 5 then do; hybrid= "H1"; seeding= "Late"; end; 
    else if trt= 6 then do; hybrid= "H2"; seeding= "Late"; end; 
    else if trt= 7 then do; hybrid= "H3"; seeding= "Late"; end; 
    else if trt= 8 then do; hybrid= "H4"; seeding= "Late"; end; 
    run;

/*  On original percentage scale */
proc glimmix data=weekly_moisture(where=(moist < 100)) /* typo? */
    plots=(studentpanel boxplot(fixed student));
    nloptions tech= nrridg;
    class rep year hybrid seeding time;
    model moist = year | hybrid | seeding | time;
    random intercept / subject=rep(year);
    *random intercept / subject=seeding*hybrid*rep(year); /* this statement is optional for ar(1), should be omitted for most other types */
    random time / subject= seeding*hybrid*rep(year) type= ar(1) residual;
    lsmeans hybrid*seeding*time / plot=meanplot(join cl plotby=seeding sliceby=hybrid);
    run;

/*  logit transformation of percentage */
proc glimmix data=weekly_moisture(where=(moist < 100)) /* typo? */
    plots=(studentpanel boxplot(fixed student));
    nloptions tech= nrridg;
    class rep year hybrid seeding time;
    moist_logit = log(moist/(100-moist)); 
    model moist_logit = year | hybrid | seeding | time;
    random intercept / subject=rep(year);
    *random intercept / subject=seeding*hybrid*rep(year); /* this statement is optional for ar(1), should be omitted for most other types */
    random time / subject= seeding*hybrid*rep(year) type= ar(1) residual;
    lsmeans hybrid*seeding*time / plot=meanplot(join cl plotby=seeding sliceby=hybrid);
    run;

Because the response is a percentage, you could consider a beta distribution (which is a generalized linear model); however the beta distribution does not work well when modeling different covariance structures for repeated measures. I've included a logit transformation in the code as an alternative, but it makes little difference with these data.

 

You could think about regressing on TIME.

 

You could think about other covariance structure types. You may run into convergence issues because of the minimal variability of the response.

 

Be sure to check weird data (e.g., a response value greater than 100).

 

You have so little variance in your data that you'll get a lot of significant effects. As you interpret your results, keep in mind that this is statistical significance, and not practical significance. 

 

fridge_wpg
Obsidian | Level 7

Hi SID,

 

Thank you very much for your input on my question. I have learned a lot by interacting with you and will utilize the resources you indicated for further enhancement of my understanding of the topic.

 

fridge

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You're welcome. You can always post follow-up questions in the future!

 

fridge_wpg
Obsidian | Level 7

Hi SID,

 

Would you mind, if I cite your help in building the statistical model? If not, kindly provide the info. Thank you,

 

fridge

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 3058 views
  • 10 likes
  • 3 in conversation