We did an experiment where we grew mushrooms on two species of tree logs across three different farms. The mushrooms were harvested and weighed with logs inoculated the first year harvested twice and logs inoculated the second year harvested once. Code and output below. Why is fixed effect estimate zero and it cannot estimate lsmeans for oak and tallow, even thought it does for oak and tallow by inoculation and harvest year?
ods graphics on;
proc sort data= mushroom.data; by TreeSp innocyear harvestyear;
proc glimmix data=mushroom.data2;
class Farm TreeSp TreeID innocyear harvestyear;
model Weight_Cum= TreeSp innocyear harvestyear innocyear*harvestyear*TreeSp /ddfm=kr2 solution;
random intercept /subject =Farm;
random TreeID(TreeSp) /group=farm;
lsmeans TreeSp innocyear*harvestyear*TreeSp/ilink lines /*pdiff=all CL*/;
Non-estimable main effect lsmeans in situations like this are usually due to "missing" data/cells. Run a PROC FREQ or PROC MEANS to see which combinations of innocyear*harvestyear*TreeSp are totally missing observations.
How to get around this issue? I come from the Milliken and Johnson school of Analysis of Messy Data, and what I learned is to fit a one-way model, with only the highest order interaction in the MODEL statement. ESTIMATE statements or even better LSMESTIMATE statements can be used to get marginal means (equivalent to lsmeans) for the effects currently listed as non-estimable.
On to the question about the zeroes. That is a result of the use of non-full rank (GLM) parameterization by PROC GLIMMIX. It is also why the solution vector is NOT what you want to examine for significance testing. Instead, the Type 3 F tests (not shown) and differences between LSMEANS/LSMESTIMATES provide that information.
Last thing, be sure you want to use the shrinkage estimates obtained through the Kenward-Rogers ddfm option. Looking at the degrees of freedom in the fixed effect solution vector, I believe you should be in good shape with its use.
Yes, that was the problem- We had inoculated logs in 2015 that we harvested in 2015 and those same logs in 2016 and then we inoculated new logs in 2016 that we only harvested in 2016 (so there isn't a combination of HarvestYear=2016 *InoculationYear=1 and that's were the zeros were coming from).
My solution was to create three cohorts (A=inoculated and harvested in 2015, B= inoculated in 2015 and harvested in 2016, and C=inoculated and harvested in 2016). However, now I am getting a negative lsmeans estimate for Tallow B, which doesn't make any sense.
Non-est lsmeans are often caused by data with missing cells. If you do a proc freq -
do you see 0 count for one or more cells? If so, that explains it.
You either combine some of the categories so no missing cells are present, or change your model, for example, taking out this three-way interaction term. You do not have two-way interactions in the model? -- something to think about adding....
Yes, thank you. That was the problem. The logs that were inoculated in 2016 were only harvested once, so there weren't two levels of harvest for the 2016 inoculation. I created a Cohort variable to capture this information but now I am getting a negative estimate that shouldn't be negative (Tallow B, which were inoculated in 2015 and harvested in 2016).
So what does your current GLIMMIX code look like? Is there anything unusual showing up in the output, such as a message in the output that the G matrix is not positive definite.
One way of avoiding a negative value, provided it is not being caused by negative numbers, would be to use a log link in the MODEL statement:
model Weight_Cum= TreeSp cohort cohort*TreeSp /ddfm=kr2 solution link=log;
The ilink option in the LSMEANS statement will then give you the lsmean on the original scale, but will be guaranteed to be greater than zero.
How many iterations does it run? GLIMMIX defaults to 20, and then says no convergence. You may want to add an NLOPTIONS statement to increase the maximum number of iterations - such as NLOPTIONS maxiter=1000:
Not sure if that is the issue, but it would be the first thing I would try.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.