BookmarkSubscribeRSS Feed
bowerske
Calcite | Level 5

 We did an experiment where we grew mushrooms on two species of tree logs across three different farms. The mushrooms were harvested and weighed with logs inoculated the first year harvested twice and logs inoculated the second year harvested once.  Code and output below. Why is fixed effect estimate zero and it cannot estimate lsmeans for oak and tallow, even thought it does for oak and tallow by inoculation and harvest year?

 

ods graphics on;
proc sort data= mushroom.data; by TreeSp innocyear harvestyear;
proc glimmix data=mushroom.data2;
class Farm TreeSp TreeID innocyear harvestyear;
model Weight_Cum= TreeSp innocyear harvestyear innocyear*harvestyear*TreeSp /ddfm=kr2 solution;

random intercept /subject =Farm;
random TreeID(TreeSp) /group=farm;

lsmeans TreeSp innocyear*harvestyear*TreeSp/ilink lines /*pdiff=all CL*/;

run;
quit;

bowerske_0-1700176013163.pngbowerske_1-1700176083768.png

 

9 REPLIES 9
Ksharp
Super User
I am guessing you put too many effect/variables in MODEL.
Try to remove effect "innocyear*harvestyear*TreeSp" and only fit main effect .
SteveDenham
Jade | Level 19

Non-estimable main effect lsmeans in situations like this are usually due to "missing" data/cells. Run a PROC FREQ or PROC MEANS to see which combinations of innocyear*harvestyear*TreeSp are totally missing observations.

 

How to get around this issue? I come from the Milliken and Johnson school of Analysis of Messy Data, and what I learned is to fit a one-way model, with only the highest order interaction in the MODEL statement. ESTIMATE statements or even better LSMESTIMATE statements can be used to get marginal means (equivalent to lsmeans) for the effects currently listed as non-estimable.

 

On to the question about the zeroes. That is a result of the use of non-full rank (GLM) parameterization by PROC GLIMMIX. It is also why the solution vector is NOT what you want to examine for significance testing. Instead, the Type 3 F tests (not shown) and differences between LSMEANS/LSMESTIMATES provide that information.

 

Last thing, be sure you want to use the shrinkage estimates obtained through the Kenward-Rogers ddfm option. Looking at the degrees of freedom in the fixed effect solution vector, I believe you should be in good shape with its use.

 

SteveDenham

bowerske
Calcite | Level 5

Yes, that was the problem- We had inoculated logs in 2015 that we harvested in 2015 and those same logs in 2016 and then we inoculated new logs in 2016 that we only harvested in 2016 (so there isn't a combination of HarvestYear=2016 *InoculationYear=1 and that's were the zeros were coming from).

 

My solution was to create three cohorts (A=inoculated and harvested in 2015, B= inoculated in 2015 and harvested in 2016, and C=inoculated and harvested in 2016). However, now I am getting a negative lsmeans estimate for Tallow B, which doesn't make any sense. 

 

 

bowerske_0-1700263076074.png

 

 

jiltao
SAS Super FREQ

Non-est lsmeans are often caused by data with missing cells. If you do a proc freq -

proc freq;

tables innocyear*harvestyear*TreeSp;

run;

do you see 0 count for one or more cells? If so, that explains it.

You either combine some of the categories so no missing cells are present, or change your model, for example, taking out this three-way interaction term. You do not have two-way interactions in the model? -- something to think about adding....

Thanks,

Jill

bowerske
Calcite | Level 5

Yes, thank you. That was the problem. The logs that were inoculated in 2016 were only harvested once, so there weren't two levels of harvest for the 2016 inoculation.  I created a Cohort variable to capture this information  but now I am getting a negative estimate that shouldn't be negative (Tallow B, which were inoculated in 2015 and harvested in 2016). 

bowerske_0-1700263413065.png

 

SteveDenham
Jade | Level 19

So what does your current GLIMMIX code look like?  Is there anything unusual showing up in the output, such as a message in the output that the G matrix is not positive definite. 

 

One way of avoiding a negative value, provided it is not being caused by negative numbers, would be to use a log link in the MODEL statement:   

model Weight_Cum= TreeSp cohort cohort*TreeSp /ddfm=kr2 solution link=log;

The ilink option in the LSMEANS statement will then give you the lsmean on the original scale, but will be guaranteed to be greater than zero.

 

SteveDenham

 

 

bowerske
Calcite | Level 5

Hi Steve,

 

I added the ilink to the model statement and now the model doesn't converge : )

 

Kristen

SteveDenham
Jade | Level 19

How many iterations does it run? GLIMMIX defaults to 20, and then says no convergence. You may want to add an NLOPTIONS statement to increase the maximum number of iterations - such as NLOPTIONS maxiter=1000:

 

Not sure if that is the issue, but it would be the first thing I would try.

 

SteveDenham

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1279 views
  • 2 likes
  • 4 in conversation