BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DIHS
Calcite | Level 5

Say you measure the size of individuals from a number of families, for each of these families some of the individuals are reared on high food, and some low food.  I am interested in whether the families respond differently to food treatment (i.e the interaction). The code is:

proc mixed data=mylib.size covtest;

class family food;

model size = food;

random family family*food;

run;

I find that the covariance parameter estimate for family is 0 but family*food is significant. I also get the error ‘estimated G matrix is not positive definite’. If I remove family and keep family*food, family*food is significant and there are no error messages.

My question is: Can I have family*food in the random statement without family?

Many thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The question of whether you can include an interaction term in a model without including the main terms has a long and stormy history.  Can you include the interaction without  the main effect? Yes, but many (most?) statisticians believe that you should include the main effects. It make interpretation easier.  For a long discussion of this subject, see this CrossValidated discussion.

I'd vote "don't do it."

Incidentally, your example is essentially the Getting Started example for PROC MIXED, which uses data that all of us can run.

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

The question of whether you can include an interaction term in a model without including the main terms has a long and stormy history.  Can you include the interaction without  the main effect? Yes, but many (most?) statisticians believe that you should include the main effects. It make interpretation easier.  For a long discussion of this subject, see this CrossValidated discussion.

I'd vote "don't do it."

Incidentally, your example is essentially the Getting Started example for PROC MIXED, which uses data that all of us can run.

SteveDenham
Jade | Level 19

Not often that I come down on the opposite of the fence from Rick, but this is going to be one of them.

I think there is a real difference between including main fixed effects in the face of interaction fixed effects and the same thing on the random side.

One can always "construct" fixed main effects from the interaction solution (Google "means model" Milliken, for example).  However, for random effects, I am not so sure as I was when I started this post.  I guess the key here would be to look at the degrees of freedom for the F tests of the fixed effects under the two specifications.  Which reflects the "skeleton ANOVA" to steal a phrase from Walt Stroup?  That is the parameterization I would trust first--and that Is the basis of Rick's "Don't do it" I suspect.  However, as models become more complex having G matrix that isn't positive definite can run the risk of failure to converge, especially if you are working in PROC GLIMMIX with a conditional model due to distributional assumptions.  In this case, I would carefully consider removing variance components that go to zero.

For the example in the OP post, the family*food variance component dominates the algorithm such that family has a zero estimate under REML.  See what happens if you change the methodology to maximum likelihood (or use the NOBOUND option).

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Agree with Steve, overall. For the variance components model, it shouldn't make a difference (for most purposes) if  you take out or leave in the main effect random term (assuming you don't use the NOBOUND option). As the model gets more complicated, the nonpositive definite G property can be quite problematic. Then you can take out the main effect for the random term. Model fitting be quite a bit easier when 0 variance terms are not included. If you are uncertain about denominator degrees of freedom for the different models, use ddfm=KR option on the model statement. This will typically make everything work out.

With NOBOUND, things are different. The 0 variance for the main effect may end up as a negative "variance". So, all the random terms are needed. This negative variance is nonsensical for a conditional interpretation of a model, but works fine for the marginal interpretation (i.e., as long as the TOTAL variance is positive).

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1565 views
  • 7 likes
  • 4 in conversation