Solved: Re: 3-factor repeated measures ANOVA

lotcarrots · Posted 02-04-2019 05:08 AM

Serum samples were split into 4 parts to treat them with a combination of temperature (2 level) and diluent (2 level). The samples were then measured at three time points. Thus, for each subject I have 2x2x3 = 12 measurements. Example:

ID Sample Time Temp Dilut
1 1 1 1 1
1 2 1 2 1
1 3 1 1 2
1 4 1 2 2
1 1 2 1 1
1 2 2 2 1
1 3 2 1 2
1 4 2 2 2
1 1 3 1 1
1 2 3 2 1
1 3 3 1 2
1 4 3 2 2

This leads to an 3-factor mixed ANOVA (3 within-subjects factors). Using proc mixed I specified this code, which works fine for compound symmetry:

PROC MIXED DATA = DATA2 METHOD=REML;
	CLASS Temp Dilut Time ID;
	MODEL y = Temp|Dilut|Time ;
	REPEATED Time / SUBJECT=ID TYPE=CS;
RUN; QUIT;

To see if this is appropriate, I would like to see other types of covariance, e.g. a unstructured covariance. However, I get the following error with this code:

PROC MIXED DATA = DATA2 METHOD=REML;
CLASS Temp Dilut Time ID;
MODEL y = Temp|Dilut|Time;
REPEATED Temp*Dilut*Time / SUBJECT=ID TYPE=UN;
RUN; QUIT;

Unable to make Hessian positive definite.

The list of "Covariance Parameter Values at last iteration" displays values of 2 to 85, never 0. In total, there are 144 observations from 12 subjects. Is that just too few or is there still an error in the code?

StatsMan · Posted 02-07-2019 09:17 AM

If you try to correlate all observations from a single ID with TYPE=UN, ie

repeated temp*dilut*time / subject=id type=un;

then that will result in a 12x12 covariance matrix with 12*11/2=66 covariance parameters. With 144 total observations, that approach will probably not be successful, resulting in the Hessian matrix message you received.

The Kronecker product structures, like TYPE=UN@UN, can be helpful in a situation like this. However, they work best when the repeated measure is done in two dimensions, not three. You can modify the dimensionality here as others have suggested, but that might not give you the results you need.

If you use a repeated approach like

repeated time / subject=id*dilut*temp type=un;

then that correlates the three observations on each of those experimental units with a 3x3 unstructured covariance matrix. As you have noticed,that does not further correlate observations from the same ID. You can handle that by adding a RANDOM statement to the model

random int / subject=id;

That will add a common covariance to all 12 observations from from the same level of ID.

Try that combination of the RANDOM and REPEATED statement.

View solution in original post

xuelin0820 · Posted 02-04-2019 09:34 AM

For my understanding, the 4 splits of each sample are sub-samples, and you have averaged your data on Sample variable to get DATA2.

Here the experimental unit of repeated measure Time is ID*Temp*Dilut, so the subject in REPEATED statement should change to as the following. Then you can try other type covarience matrix structure.

PROC MIXED DATA = DATA2 METHOD=REML;
	CLASS Temp Dilut Time ID;
	MODEL y = Temp|Dilut|Time ;
	REPEATED Time / SUBJECT=ID*Temp*Dilut TYPE=CS;
RUN; QUIT;

lotcarrots · Posted 02-04-2019 10:38 AM

It's true that this code works. But do I really model the covariance correctly with this code? All my observations within a subject are dependent. Not nested within temperature or diluent.

Again for explanation: The 4 splits are used for the 4 different treatment combinations (2x2=4 combinations for temperature and diluent, both two levels). They were not averaged as they are single observations, each for one of the 4 treatments. The repetitions then come from different subjects (n=12). I have another datset in progress, hence the name if you wondered.

lotcarrots · Posted 02-06-2019 10:32 AM

To clarify: I do not expect that the residuals for temperature 1 and 2 are independent of one another neither for diluent.

What do you think of this work-around with recoding the 4 treatments?

data data2;
   set data2;
   Temp_Dilut = catx(',', Temp, Dilut);
run; 

PROC MIXED DATA = data2;
CLASS ID Temp Dilut Time Temp_Dilut;
MODEL y = Temp|Dilut|Time;
REPEATED Temp_Dilut Time / subject = ID TYPE=un@un r;
RUN;

The timepoints are not equally spaced. Measurements were taken after 24h, 72h and 144h.

Any opinion welcome!

xuelin0820 · Posted 02-06-2019 11:38 AM

If they were not averaged, then the experimental unite of Time is ID*Sample or ID*Temp*Dilut (here Sample and Temp*Dilut are one-to-one relationship). You need include Subject in your code as REPEATED Time / SUBJECT=ID*Sample TYPE=CS; To decide which covariance structure is more appropriate, you can use likelihood test.

lotcarrots · Posted 02-06-2019 03:25 PM

I do not quite understand: What do you mean by Subject? I don't have a variable called 'Subject'. My experimental unit is ID with repeated measures for Temp and Dilut on time.

xuelin0820 · Posted 02-06-2019 04:20 PM

sorry, there is a typo. it should be subject=ID*Sample. Your repeated measure were applied on each sample of a particular ID, so the subject of repeated measure is not ID, but ID*Sample.

StatsMan · Posted 02-07-2019 09:17 AM

If you try to correlate all observations from a single ID with TYPE=UN, ie

repeated temp*dilut*time / subject=id type=un;

then that will result in a 12x12 covariance matrix with 12*11/2=66 covariance parameters. With 144 total observations, that approach will probably not be successful, resulting in the Hessian matrix message you received.

The Kronecker product structures, like TYPE=UN@UN, can be helpful in a situation like this. However, they work best when the repeated measure is done in two dimensions, not three. You can modify the dimensionality here as others have suggested, but that might not give you the results you need.

If you use a repeated approach like

repeated time / subject=id*dilut*temp type=un;

then that correlates the three observations on each of those experimental units with a 3x3 unstructured covariance matrix. As you have noticed,that does not further correlate observations from the same ID. You can handle that by adding a RANDOM statement to the model

random int / subject=id;

That will add a common covariance to all 12 observations from from the same level of ID.

Try that combination of the RANDOM and REPEATED statement.

lotcarrots · Posted 02-07-2019 10:41 AM

First: Thank you both, xuelin0820 and StatsMan, for helping me solve my problem. I am learning a lot here.

I tried both your solutions, for a start with type=CS or type=UN:

1) only the repeated (named CS1 and UN1 later)

REPEATED time / SUBJECT= ID*Sample TYPE=UN;

2) repeated + random (named CS2 and UN2 later)

REPEATED time / SUBJECT=ID*Temp*Dilut TYPE=UN;
RANDOM int / SUBJECT=ID;

In general the question: Is it correct that 'method = ML' should be chosen for the comparison of covariance structures (instead of REML)?

Then I get no warning and a table of fit-statistics as

It seems that both random and repeated supply a better fit, with a significant difference in likelihood in favor for UN2 instead of CS2:

For the same model definitions I would look at AR1 and ARH too. Then continue working with the model that offers the best fit in terms of likelihood. Do you agree with that workflow?

StatsMan · Posted 02-07-2019 11:27 AM

You are on the right track, with the exception that REML should be used when comparing covariance structures using the same fixed effect model.

lotcarrots · Posted 02-07-2019 12:28 PM

Perfect, thank you for your advise!

Does that mean I can compare CS1 and UN2 or all other combinations as well with a likelihood ratio test because they have the same fixed effects (even though the random part may differ)? It would be considered "nested". The same as when I drop a fixed effect.

Only for non-nested (different fixed effects) I must stay in the same type of covariance and change over to AIC model selection.

I marked your answer as a solution since it helped a lot and answered my question. Gradually, there are only so many more.

StatsMan · Posted 02-08-2019 08:44 AM

Yes, you can cautiously compare the log-likelihoods under REML when only changing the covariance structure of the model. Just remember to keep the fixed effects the same in each model you are comparing.

lotcarrots · Posted 08-21-2019 05:43 AM

I add my question here, because I have another similar experiment and appreciate your help. Let's say I am also interested in whether the individual animals (ids) differ. Can I simply put the ID to fixed and delete the random statement? The repeated measurements remain over the two other factors (dilut, temp). So instead of

PROC GLIMMIX DATA = dataset;
	CLASS dilut temp time id;
	MODEL y = dilut|temp|time/ DDFM=BETWITHIN;
	RANDOM time / subject=id*dilut*temp type=un;
	RANDOM int / SUBJECT=id solution;
RUN; QUIT;

I would use

PROC GLIMMIX DATA = dataset;
	CLASS dilut temp time id;
	MODEL y = dilut|temp|time id/ DDFM=BETWITHIN;
	RANDOM time / subject=id*dilut*temp type=un;
RUN; QUIT;

Please do not get confused, by virtue of other settings and available plots I switched to GLIMMIX.

StatsMan · Posted 08-21-2019 08:04 AM

It really depends on how you want to treat the id levels. If ID is a sample from a population of potential ID's, then you might want to keep ID as a random effect. If the variance component for ID is significantly large (and you can use a COVTEST statement in GLIMMIX if you want a hypothesis test on the significance of the ID variance), then yes there is variability in the id levels. You can still use CONTRAST or ESTIMATE statements if you want to test for differences in the individual levels.

If the levels of ID you have in your data are the only levels of ID you are interested in, then you can make ID a fixed effect. You can use the LSMEANS statement with the PDIFF option to test for differences in pairs of levels, as well as the CONTRAST, ESTIMATE, and LSMESTIMATE statements.

In practice, you will see some researchers switch ID to a fixed effect even if the ID levels do not have all of your levels of interest. The theory really does not support that approach, however.

RosieSAS · Posted 08-21-2019 09:49 AM

Hi StatsMan, I'm very interested in your answer. I know it is correct, but don't know how to use CONTRAST or ESTIMATE statements to test for differences in the individual levels when ID is a random factor. Can you explain more, or give a short example?

There is a general opinion that when considering if we can combine the data from different IDs, or say Years etc, we need to check if Year interacts with other fixed effects. If there is one interaction significant, then it is better to analyze the data by Year. Or when Year effect is significant, even the interact between Year and other factors are nor significant, we also should analyze the data by Year, not combine the data from multiple years. Of course, to test the interaction, Year has to be treated as a fixed effect in SAS, although as you said it is theoretically not appropriate. What do you think? Thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?