BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
palolix
Obsidian | Level 7

Dear all,

I am working with an unbalanced split-split plot design (date as a split-block factor) with 4 factors; "nem" =3 levels, "trt"=2 levels, "cultivar"=3 levels and date=4 levels.  The following missing treatment combination for the nem*cultivar interaction are present ("."):

                            nem

cultivar          1      2      3

Nord               x     x      .

Sang              x     .       x

Beretta           .      x      x

Everything looks OK when getting the results of my estimates, nevertheless I get just a few significant p-values for most of them, and I know from my data that I should expect more significant results. Could it be that the problem is due to the split-plot model, where the different factors do not share the same error term as in a CRD or RCBD?  Should I determine an appropiate error term for any particular estimate?


Are maybe the many missing observations that I have yielding to non-valid results? In my estimates I want to compare the first date vs the last date (I have 4 dates in total, but for my first dependent variable all observations are missing for date 2, so that is why I am replacing date 4 with date 3 when comparing the first and the last date in the nonpositional syntax.  In the positional syntax I omit date 2 when calculating the coefficients).  Both syntax yield to the same results for all estimates, so this should be right.


Here is my code, with only 2 of the estimates;

Proc mixed data=one;
class blk nem trt cultivar date;
model lgsprCysts100=
nem
trt
trt*nem
cultivar
cultivar*nem
cultivar*trt
nem*trt*cultivar
date
nem*date
trt*date
nem*trt*date
cultivar*date
nem*cultivar*date
trt*cultivar*date
nem*trt*cultivar*date;
random blk*nem blk*nem*trt blk*nem*trt*cultivar blk*date blk*nem*date blk*trt*date blk*nem*trt*date blk*cultivar*date blk*nem*cultivar*date blk*trt*cultivar*date;

lsmestimate nem*trt*cultivar*date 'Positional dateHafer2010 vs. dateSG2012 when nem=2 trt=1 & cultivar=Beretta' 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
                                  'Nonpositional NTCD2111 - NTCD2114'    [1, 2 1 1 1][-1, 2 1 1 3],
                                  'Positional dateHafer2010 vs. dateSG2012 when nem=2 trt=2 & cultivar=Beretta' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
                                  'Nonpositional NTCD2211 - NTCD2214'    [1, 2 2 1 1][-1, 2 2 1 3],
                                   'Positional dateHafer2010 vs. dateSG2012 when nem=2 trt=1 & cultivar=Nord' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
                                  'Nonpositional NTCD2121 - NTCD2124'    [1, 2 1 2 1][-1, 2 1 2 3],
                                   'Positional dateHafer2010 vs. dateSG2012 when nem=2 trt=2 & cultivar=Nord' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0,
                                   'Nonpositional NTCD2221 - NTCD2224'    [1, 2 2 2 1][-1, 2 2 2 3],

run;

I would greatly appreciate your help!!!!

Thank you very much!!

Caroline

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

You have the greatest designs Smiley Wink

The E option in PROC MIXED estimates and LSMESTIMATES gives the coefficients of the estimable functions L.  So you don't specify the error term like you do in GLM.

I am going to worry more about the estimates.  It's time to get the LSMEANS for nem*trt*cultivar*date, and check those values, and try calculating (by hand!) the LSMESTIMATEs you have.  If you are getting values with the wrong sign, you may need to rethink the coefficients in your statements.

Steve Denham

View solution in original post

13 REPLIES 13
SteveDenham
Jade | Level 19

Hi again Caroline,

The code looks fine, and should return everything with the proper error terms.  Take a look at the lsmeans and standard errors--do they look appropriate?  Your expectation of more significances than are being found is likely due to the size of the standard errors, and to the degrees of freedom associated with each estimate.  If those are incorrect, you may need to apply one of the ddfm= options.  It should be using the default CONTAIN option, but a hand calculation should be applied to check this.

The other option, if possible, would be to move 'date' to a repeated statement, but only if the subplots are measured at different time points.  This may drastically change both the standard error estimates and the degrees of freedom.

Steve Denham

palolix
Obsidian | Level 7

Thank you so much Steve, I am so glad that you can help me with this situation!!!

The estimates and standard errors looks odd, and the DF=4 for all the estimates.  For example if I look at all estimates when nem=2, half of them are negative (which make sense to me, because I know that in the last year the number of disease eggs are much more compared to the first year), but for half of the estimates, the values are positive, which cannot be (because the disease eggs are higher in the last year).  With respect to the standard errors, I only get 2 different standard errors for all estimates!  For example, when nem=2, for different cultivars and treatments levels, the standard errors are the same (0.3188), and also when nem=3 (also 0.3188) !!  You are right, the degrees of freedom method is CONTAINMENT.

What about the E=effect option at the end of the estimates?  I think this is necessary when looking for the appropriate error terms when the estimates involve main effects, but not when involving the highest oerder interaction.  Am I right?

I should not use "date" as repeated measure because the subplots were measured once at year, but not under the same conditions, because every year a different cereal cultivar was cropped, and different cereals have a different effect on nematode reproduction.

Thank you Steve!!

Caroline

SteveDenham
Jade | Level 19

You have the greatest designs Smiley Wink

The E option in PROC MIXED estimates and LSMESTIMATES gives the coefficients of the estimable functions L.  So you don't specify the error term like you do in GLM.

I am going to worry more about the estimates.  It's time to get the LSMEANS for nem*trt*cultivar*date, and check those values, and try calculating (by hand!) the LSMESTIMATEs you have.  If you are getting values with the wrong sign, you may need to rethink the coefficients in your statements.

Steve Denham

palolix
Obsidian | Level 7

Ohh Steve, Now I got it!!  I did it by hand and realized that the calculated estimates were not the same from those that I got.  At first I did not know why, but then I realized that I was making a very stupid mistake when writing my coefficients, because I wanted to compare the first vs the last dates, but then I went to the class table and noticed that the last date for me was not the last date for sas, because of the alfabethical order.  I was doing it right by the other factors but not with date.  I corrected the coefficients and now I get the sames estimates to those that I calculated by myself!!!  What a stupid mistake!!  I am so glad you suggested me to calculate it by hand!!

Thank you very much Steve!!!!

Caroline

SteveDenham
Jade | Level 19

I think that is why I "indexed" the dates when we were working on a previous analysis.  Anyway, I am glad things are working out.

Steve Denham

palolix
Obsidian | Level 7

Now do you think it would be a good idea to add ddfm=kr or satterweith to my model?

Thank you,

Caroline

SteveDenham
Jade | Level 19

I'd go with the Kenward Rogers adjustment first, as it applies the Harville and Jeskie adjustment to the variance-covariance matrix and then calculates the Satterthwaite df, which could be important with this pattern of missing data.  However, don't be surprised if all of a sudden the denominator degrees of freedom drops to 1.  The adjustment then is too conservative and I would drop back to ddfm=satterthwaite.

Steve Denham

palolix
Obsidian | Level 7

Mmmm, I tried Kr and Satterweith and with both I get the same results; <.0001 for all the estimates and DF= 31.83 or 28.17 instead of 4.  This looks very odd to me, so I would rather omit the ddfm= option in my model.  Do you agree with me Steve?

By the way, I am so happy with the results from my estimates!!!  Now they look like they do!

Thank you Steve!!

Caroline

SteveDenham
Jade | Level 19

No.  I would definitely stay with the adjustment, the DF reflects the "true" degrees of freedom associated with the error terms appropriate to the pattern of missingness.

Steve Denham

palolix
Obsidian | Level 7

Ok.  Do you think Satterweith would be better? I am surprised that the DF goes from 4 to 31 or 45!  What about adding the adjustment=simulate?

Thank you,

Caroline

SteveDenham
Jade | Level 19

DDFM=Satterthwaite may or may not give exactly the same df as KR--it all depends on the Harville-Jeskie adjustment, which is critical with missing data or repeated measures.

As far as adjusting, I would do that, but that is a personal preference.  There is some philosophy to consider, such as whether the study is exploratory or confirmatory in nature, and the relative cost of making Type I and Type II errors.  But I assume this is confirmatory, and controlling Type I error is more important.  Given all of that--separate each of the estimates with a comma (it looks like that is the case already) and add after the last estimate

/adjust=simulate(seed=1) ;/* specify the seed for the simulation.  If you don't, you could get different adjusted p-values when you rerun the analysis.  I use seed=1 only for convenience.*/

Steve Denham

palolix
Obsidian | Level 7

Thank you very much Steve for your good suggestions and great support!!

All thes best,

Caroline

palolix
Obsidian | Level 7

Hello again Steve,

When analyzing some dependent variables, where I only need 2 levels of the factor 'cultivar' (which has 3 levels; Nordstern, Sang and Beretta), so the number of observation that sas use are 27 out of 40, I only get the LSM but not the estimates.  The problem is that in my Type 3 Test of Fixed Effects Table I only get the 'nem' and 'cultivar' main effects but not the interaction nem*cultivar (which I suppose is not estimable), so I think that is why I am not getting the estimates.  When writing the estimates statements I think I should only write he main factors but not the interaction: lsmestimate nem cultivar 'label' [1, 1 2][-1, 2 2], instead of nem*cultivar 'label', but if I just write nem cultivar 'label', sas tells me that is expecting a * (nem*cultivar), so it does not work either.

Here is my code:

proc mixed data=one;
where cultivar="Nordstern" or cultivar="Sang";
class blk nem cultivar;
model ahreave=
nem
cultivar
nem*cultivar/ddfm=kr;
random blk blk*nem;
lsmeans nem*cultivar;
lsmestimate nem*cultivar  'nem1 vs nem2 | cultivar=Norstern' [1, 1 2][-1, 2 2],
                                      'nem1 vs nem3 | cultivar=Sang' [1, 1 3][-1, 3 3],                               
                                      'cultivarNord vs cultivarSang | nem=1' [1, 1 2][-1, 1 3];                       
                                       run;  

Is there a way to write the lsmestimate without the interaction nem*cultivar (since this is not estimable)?  Do you think this is the reason why I am not getting the estimates results?

I would greatly appreciate if you could give me some light.

Thank you very much Steve!

Caroline

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1765 views
  • 0 likes
  • 2 in conversation