BookmarkSubscribeRSS Feed
Kojema
Fluorite | Level 6

Hello, I am performing an analysis that is kind of new to me, and is more complicated than any examples I can find online. I am a little bit over my head, and am experiencing convergence issues when I include certain interaction terms... I was wondering if anyone could answer some questions...

DESCRIPTION

-Goal is to test the difference in nitrogen export ("TN_Exp") from watersheds with differect agricultural intensities ("AgInt", 3 levels)

-measured monthy (March through Oct) exports of nitrogen in 19 watersheds, for 12 years

-Also wanted to test the effect of monthly precipitation ("RunnoffDepth", covariate) on export

-And ecological area ("EcoA", 4 levels, but not fully crossed with AgInt, so I have nested EcoA within AgInt: "EcoA(AgInt)")

-And whether there was a linear trend in the 12 years ("Year", covariate)

-And whether there was a seasonal effect ("Month", 8 levels)

-As this is an RM design, I Included an r-side random  month term with Stream(AgInt) as a subject. I didn't necessarily expect temporal dependance between years, aside from any linear trend, however, see questions...

-TN_Exp is a right skewed continuos variable, thus I used a gamma distribution, with a log link function

-Stream(AgInt) was included as a random effect, as I did not care about specific differences between streams, but wanted to control for unmeasured characterisitcs at the watershed level

CODE

Proc glimmix data= ExpCoeff;

Class AgInt Month Stream EcoA;

Model TN_Exp =   Month  AgInt  EcoA(AgInt)

   RunoffDepth  Year

  AgInt*Month

  RunoffDepth*AgInt  RunoffDepth*Month  Year*AgInt  Year*Month

  RunoffDepth*Year

/ dist=gamma link=log solution ddfm=kenwardroger;

Random  Stream(AgInt) / solution;

Random  Month /  Subject= Stream(AgInt) Type= cs residual ;

covtest 'indep' indep;

nloptions maxiter=2000 tech= newrap;

Run;

QUESTIONS

1) In the "solution for random effects", all of the p-values for the estimated coefficients of each stream are >0.05 (in fact, the lowest is 0.41) Does this mean that the random effect of stream can be removed?

2) Is it necessary to nest Stream within Agint? Each stream has a different name...or perhaps I should be doing more complicated nesting since there are mulitple categorial variables????

3) For the repeated statement, should month be nested within year...I don't think so as I am assuming the correlations between months are similar from year to year...

4) For the Random Month effect, I would expect an Ante(1) or TOEP variance-covariance matirx would be better than CS, but get the following error when I use anything more complicated than cs:

WARNING: Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed. How might I be able to fix this?

5) If I wanted to consider temporal autocorrelation between years, would I include an an ADDITIONAL random statement?? i.e. Random Year / Subject= Stream(AgInt)

5) When I include interaction terms with EcoA(AgInt), the model does not converge... I am celarly not understanding something... can anyone explain this or enlighten me? Perhaps this is becasue I have interaction terms with AgInt, within which EcoA is nested?

7) When I include three-way interaction terms (i.e. RunoffDepth*AgInt*Year RunoffDepth*Month*Year Year*AgInt*Month) the model does not converge. Should I  allow more iterations, or is the model perhaps simply too complicated. Is it acceptable to not include three-way interaction terms? It would be onerous to interpret such a complicated model anyways.... Is it acceptable to simply choose interaction terms you have reasons to believe would exist/ you are interested in?

😎 If you happen to notice an potential errors, I would love to know!!!

THANK YOU SO MUCH TO ANYONE WHO HAS ANY INSIGHT!!!!

Madison

6 REPLIES 6
SteveDenham
Jade | Level 19

Madison,

I would try to keep all the design factors in the model.  One of the things that might be occurring is that for generalized models, many times the use of R-side covariance structures leads to problems.  Search the Statistical Procedures Forum for some posts on this, especially those from and .  I would suggest trying to fit your model with only the following random statements:

random intercept /subject=stream(AgInt);

random month/subject=stream(AgInt) type=ar(1);

You may want to try other covariance structures, based on your knowledge of the biology.  In addition, since the data are being processed by subjects, and with a log link, the means and variances are not independent, you may want to try either the Laplace or adaptive quadrature method of fitting.

Steve Denham

Kojema
Fluorite | Level 6

Thanks Steve!

You were right, things ran a lot more smoothly using G-side specifications, and I was able to try out other covariance structures.

On a related note, the test of covariance parameters ended up being non signifcant. This was a different result from when I used R-side covariance structures, but when I removed the Random Month statement (leaving a significant "random intercept /subject=stream(AgInt)" effect), it didn't change the interpretations of the fixed effects, so I am presuming I shouldn't really worry about this discrepancy.

I also took your recommendation re: fitting, and used the Laplace method (quad gave the same results). Would you be able to recommend a denominator df adjustment? I used the "betwithin" option but can't say that I would have any good reason for choosing it over another method!!

Thanks

Madison

SteveDenham
Jade | Level 19

Last question first--I would use the ddfm=bw (between-within) in this case, so I agree there.  None of the other adjusted df methods (satterthwaite or kenward-rogers) are applicable under the Laplace and Quad methods.  You might explore one of the sandwich estimators, just to see if they will run under Laplace (I haven't done this) to handle the correlated errors somewhat more efficiently.

Interesting observation about removing the Month random effect.  What happened to any tests between means, and standard errors of the means and differences?  Even though it may not be "significant", I think it is probably a real design element that should be accommodated.  This isn't a case where parsimony of the model is worth ignoring, in my opinion.

Steve Denham

Kojema
Fluorite | Level 6

Hi, just thought I'd update the progress of this analysis, and add on a discussion topic

I ended up not using month as a variable, and used "season" instead. It was too onerous to interpret differences between individaul months, and seasonal differences are what I was really interested in on the time side of things. As a nice side effect, this made my model run much more smoothly. This also allowed me to take out period of sampling as a repeated measure, as I figured seasonal differences would be great enough that temporal autocrrelation between seasons wouldn't be a major issue.

I also made another decision that could be controversial....if anyone could lend me some advice that would be excellent....:

     I originally had "stream" as a random block, as multiple measurements were taken per stream and I wanted to account for unexplained within-catchment variability. However, none of the estimated coefficients for individual streams were significantly different than zero. And the "covtest", or test of covariance parmaters, had a p-value of 1, indicating no significant difference from independence. Thus I decided to remove stream as a random block. In the analysis of total nitrogen, this gave equivalent AIC values, an identical model (nonsignificant terms were removed sequentially), and almost identical LSMEANS etc. For the analysis of total phosphorous, however, removing "stream" as a random block resulted in a much smaller AIC  (160 units lower!), indicating a more parsimonious model, but the final model ended up being different, as were estimates of coefficients. Importantly, without the random block, there were significant differences between watersheds with different agriculutral intensities, and with the random block, there were not. Not that I am trying to bias my analysis toward the results I want, but I am concerned that leaving in a variable that is apparently nonsignificant is clouding my ability to detect the differences I am interested in. I am thus somewhat torn about whether to remove the random block or not., as I've always been tought to remove non-significant variables, but stream was a design factor (Steve I know you said to keep the design factors!). However, maybe this is telling me that when things like agricultural intesity, ecoregion, runoff, season, etc. are known, nutrient export variation between watersheds is pretty well acounted for...

Pre-thanks for any more comments, and sorry for rambling!

Madison

SteveDenham
Jade | Level 19

Random effects: when to keep them and when to let them go, and what they really mean (at least to me).

I think this gets into the inference space realm.  By deleting the random effect, you are essentially testing whether the BLUPs are different.  In other words, if you repeated the experiment over and over again on the same "blocks", what differences could be termed repeatable.  By keeping the random effect in the model--significant or not--you are looking at a broader inference space, of "all possible blocks."  Thus, it isn't surprising that the results differ.  I think what may be going on is that there just isn't enough data to estimate the standard errors of the covariance parameters with a great deal of precision, so they come up nonsignificant.

This one will come down to your ultimate goal with the analysis: to describe what happened in your instance, or to generalize to other instances.

Steve Denham

Kojema
Fluorite | Level 6

Thanks for this Steve,

Good food for thought.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2598 views
  • 6 likes
  • 2 in conversation