BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
skr01
Calcite | Level 5

Hello,

I am currently working on a model that I posted a different sort of question about in a previous post. My current question is: What is the relationship between the repeated effect that is indicated in the "random" line of code (random repeated_effect / subject=object_repeatedly_measured;), and the fixed effects in the model statement?  Must the repeated effect be a fixed effect? If so, must it appear in the model statement? 

A snippet of the model I am starting from is below. Measures were taken in each season between 2007 and 2012. This model results in a non-significant test for season (but singificant test for seasonyear), and most lsmeans for environment and season being unestimable. I presume that is because of confounding between season and seasonyear.

proc glimmix...

model Paeru= environment season seasonyear / ddfm=residual;

random house house*environment;

random seasonyr / subject= realloc*house residual; ...

If I drop seasonyear from the fixed effects and leave the rest of the model alone, I get a significant effect of season and estimable lsmeans.

proc glimmix...

model Paeru= environment season / ddfm=residual;

random house house*environment;

random seasonyr / subject= realloc*house residual; ...

(1) Is this a legitimate thing to do? Specifically, can I include a repeated effect factor that is not in the model statement?

Upon thinking about it more, I concluded that actually, year is an effect that is missing from this model, and that seasonyear could be thought of as season*year. My first step toward this path, because it seemed most like the way I have seen models put together in other settings, was to put year in as a fixed factor. The following model also yields non-estimable lsmeans (and identifies significant differences among seasons and years but not season*year).

proc glimmix...

model Paeru= environment season year season*year / ddfm=residual;

random house house*environment;

random season*year / subject= realloc*house residual; ...

(2) Is that a legitimate thing to do? If so, is it because I now have season*year both in the model and as the repeated effect in the second random statement? If so, is there something I can do I do about the inability to get lsmeans?

But really, I am not interested in the specific years in the study, and perhaps there are enough of them that I can deal with them as the random factor that I think they really are (the study ran from 2007 into 2012). So now I have the model below. It

proc glimmix...

model Paeru= environment season / ddfm=residual;

random house house*environment year;

random season*year / subject= realloc*house residual; ...

(3) Is that a legitimate thing to do? If not, which of the many changes from my starting point is/are the problem?

Thanks for your insights!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

This looks good.  I hope it starts getting you what you need.

So, last paragraph--how to see if there is a seasonal variability.  If season is still on your dataset, then you could try:

proc glimmix...

class season /* Add it in as another class variable */...

model Paeru= environment seasonyearnum  / ddfm=residual;

random house house*environment;

random house*seasonyearnum/group=season;/* Adding in a second random statement, that will fit a variance component for each season */

random seasonyearnum / subject= realloc*house type=AR(1)

Now, since the fixed effects do not change, you can look at the information criteria to find out if modeling separate variances of house*seasonyearnum enables you to recover more of the information (AIC or AICc for this, smaller is better).

Steve Denham

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

I would suggest a read of Milliken and Johnson's Analysis of Messy Data.  In it, you will find a useful approach they term a "means model".  Essentially, this means expressing the model statement in an interaction form, so that it is a one-way model.  Then, the repeated term is pulled out to the random statement (and is in both), and main effects and interactions of interest are constructed using CONTRAST statements (in their text) or by using LSMESTIMATE statements in more recent versions of SAS/STAT.

it would start with:

proc glimmix...

model Paeru= environment*season*year  / ddfm=residual;

random house house*environment;

random season*year / subject= realloc*house residual; ...

Steve Denham

skr01
Calcite | Level 5

What a great book! It took me a while to track down a copy, but I'm so glad for the recommendation! I have bee playing with a means model approach, but with 8 environments and 19 season/year combinations, it becomes extremely unwieldy. What do you think of a mixed approach, like this (I replaced season*year with a number variable seasonyearnum that simply numbers them for simplicity and so that the AR(1) covariance structure makes sense)?

proc glimmix...

model Paeru= environment seasonyearnum  / ddfm=residual;

random house house*environment house*seasonyearnum;

random seasonyearnum / subject= realloc*house type=AR(1) residual; ...

The data are too imbalanced to look at much in the way of interactions involving environments anyway, so from the perspective of giving up the ability to test such questions I'm OK with it -- I'm just not sure about whether the syntax as written is OK (I really still don't understand why the repeated effect must appear in the model statement -- I'm still trying to get my head around that).

If this approach is correct, I understand that I can test for differences among seasons (a fixed effect of interest, unlike year) using lsmestimate statements, and estimate probabilities of recovery for the seasons using lsmestimate statements as well. Looking at the data using this approach, it appears that in some of my models there is also important house*seasonyearnum variability. Is there a way for me to determine how much of that is that seasonal variation patterns vary across houses, rather than a more general temporal variation?

Susi


SteveDenham
Jade | Level 19

This looks good.  I hope it starts getting you what you need.

So, last paragraph--how to see if there is a seasonal variability.  If season is still on your dataset, then you could try:

proc glimmix...

class season /* Add it in as another class variable */...

model Paeru= environment seasonyearnum  / ddfm=residual;

random house house*environment;

random house*seasonyearnum/group=season;/* Adding in a second random statement, that will fit a variance component for each season */

random seasonyearnum / subject= realloc*house type=AR(1)

Now, since the fixed effects do not change, you can look at the information criteria to find out if modeling separate variances of house*seasonyearnum enables you to recover more of the information (AIC or AICc for this, smaller is better).

Steve Denham

skr01
Calcite | Level 5

Thanks Steve -- my bad for dropping the thread, because I failed to point out that I was using Residual PL with Newton-Raphson with Ridging as the optimization technique, so if I understand correctly, likelihood based fit statistics are not valid. I would love to use LaPlace or Quad (with the added benefit of getting the repeated measures onto the G-side), but those versions give the "not enough memory" message. I spent a chunk of time back with that problem, but that's a boring issue because it's clearly just the imbalanced, sparse nature of the data set. But I like the approach, and when I next have a better behaved data set I intend to put it to use!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1636 views
  • 1 like
  • 2 in conversation