Yes right now I am using random _residual_ in the model, but it still seems rather overdispersed. It goes from X2/df of 90>32 when that term is included, but can only ever get close to 1 when I square root transform the data. I've tried a negbin model, but have convergence issues.
For various problems, including non-convergence, read Tips and Strategies for Mixed Modeling with SAS/STAT® Procedures and Advanced Techniques for Fitting Mixed Models Using SAS/STAT® Software. But first, you'll want to be sure you are using the correct syntax, there is scant point in worrying about non-convergence in a wrong model.
30 new leaves are sampled from plants every interval. Site=plant in my case because I only have one study plant per site.. I was taking leaves from three separate sections of the canopy, but we decided to eliminate that from the model due to a list of issues including sections not being true replicates..
I would think that you could include section as a fixed effects factor, although it is true that levels of section cannot be randomly assigned to the 3 plant "divisions". (We generally don't randomly assign gender to animals either, and yet we still use gender as a predictor variable.) But perhaps you are thinking of section as a random effects factor, in which case section is a subsample, not a true replicate, as are individual leaves.
Do you have 10 leaves from each of the 3 sections? If not, and if section matters, then you could have a bias problem.
2) Ive never heard of a structural equation model, so I will do some research into it.
For a biology-oriented introduction see Ch 8 by Grace, Scheiner, and Schoolmaster in Ecological Statistics Contemporary theory and application.
3) Yes I will remove subject=climate. I originally had it as "random interval / subject=site residual" The intent was to treat interval as a repeated measure of site. I think that way was correct?
Yes, interval is a factor associated with repeated measurements on sites. That would be closer to correct, but you are not using the residual option correctly. Only RANDOM statements that specify elements of the R matrix should include the residual option; RANDOM statements that specify elements of the G matrix should not. Site and repeated measures on sites are specified in G; leaves are specified in R.
4) There is a lot more going on here with seasons I didn't really get into. I showed climate types vary by temperature and so do intervals, so if I see seasonal/climate differences they can in part be attributed to temperature differences. I can't directly model temperature because temperature is confounded by site due to a single temperature point corresponding to a site. Intervals are assigned to seasons based on mean monthly temperatures (0-15 degrees, 15.01-20, and 20.01-25). Assigning intervals to arbitrary seasons such as Winter Fall Spring Summer doesn't help us interpret seasonal effects in this case, and southern CA doesn't have traditional seasons.
If temperature is a site-level fixed effects factor, you can model it (that's what mixed models are good at) as a regression. If climate is just a categorized version of temperature, then I would consider using temperature instead. But climate as a factor may be a more complex concept, involving something more than just temperature.
I do not have a full understanding of how and why you are defining "season"; it still strikes me as an arbitrary categorization. But moving forward, consider a model with only interval (omitting season) for the moment because it is simpler than a model with intervals nested within seasons:
proc glimmix data=final;
class site climate interval;
ln_leafarea = log(leafarea);
model counts = climate interval climate*interval
/ offset = ln_leafarea dist=negbin;
random intercept / subject=site(climate);
random interval / subject=site(climate);
run;
You could explore different forms of temporal autocorrelation among the repeated measurements by replacing the two RANDOM statements above with something like
random interval / subject=site(climate) type=<whatever>;
As you note, interpretation of a factor with 26 levels is a challenge. Rather than incorporate interval as a CLASS factor, you might regress on interval, possibly with a curvilinear model, possibly with a spline model (see https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html) or other smoother (see https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glimmix_sect018.htm).
Lots of different ways to envision the analysis. The challenge is knowing what possibilities exist, identifying the ones that validly represent the structure of the experiment, and then selecting a "best" among those while maintaining objectivity. (Frank Harrell has written, "Using the data to guide the data analysis is almost as dangerous as not doing so.")
Is previous_para the value for counts in the previous interval? Or does counts measure one insect species and previous_para a different insect species?
I hope this helps. If you have access to a statistician at your institution, take advantage of that opportunity.
... View more