Serum samples were split into 4 parts to treat them with a combination of temperature (2 level) and diluent (2 level). The samples were then measured at three time points. Thus, for each subject I have 2x2x3 = 12 measurements. Example:
ID Sample Time Temp Dilut
1 1 1 1 1
1 2 1 2 1
1 3 1 1 2
1 4 1 2 2
1 1 2 1 1
1 2 2 2 1
1 3 2 1 2
1 4 2 2 2
1 1 3 1 1
1 2 3 2 1
1 3 3 1 2
1 4 3 2 2
This leads to an 3-factor mixed ANOVA (3 within-subjects factors). Using proc mixed I specified this code, which works fine for compound symmetry:
PROC MIXED DATA = DATA2 METHOD=REML;
CLASS Temp Dilut Time ID;
MODEL y = Temp|Dilut|Time ;
REPEATED Time / SUBJECT=ID TYPE=CS;
RUN; QUIT;
To see if this is appropriate, I would like to see other types of covariance, e.g. a unstructured covariance. However, I get the following error with this code:
PROC MIXED DATA = DATA2 METHOD=REML;
CLASS Temp Dilut Time ID;
MODEL y = Temp|Dilut|Time;
REPEATED Temp*Dilut*Time / SUBJECT=ID TYPE=UN;
RUN; QUIT;
Unable to make Hessian positive definite.
The list of "Covariance Parameter Values at last iteration" displays values of 2 to 85, never 0. In total, there are 144 observations from 12 subjects. Is that just too few or is there still an error in the code?
If you try to correlate all observations from a single ID with TYPE=UN, ie
repeated temp*dilut*time / subject=id type=un;
then that will result in a 12x12 covariance matrix with 12*11/2=66 covariance parameters. With 144 total observations, that approach will probably not be successful, resulting in the Hessian matrix message you received.
The Kronecker product structures, like TYPE=UN@UN, can be helpful in a situation like this. However, they work best when the repeated measure is done in two dimensions, not three. You can modify the dimensionality here as others have suggested, but that might not give you the results you need.
If you use a repeated approach like
repeated time / subject=id*dilut*temp type=un;
then that correlates the three observations on each of those experimental units with a 3x3 unstructured covariance matrix. As you have noticed,that does not further correlate observations from the same ID. You can handle that by adding a RANDOM statement to the model
random int / subject=id;
That will add a common covariance to all 12 observations from from the same level of ID.
Try that combination of the RANDOM and REPEATED statement.
For my understanding, the 4 splits of each sample are sub-samples, and you have averaged your data on Sample variable to get DATA2.
Here the experimental unit of repeated measure Time is ID*Temp*Dilut, so the subject in REPEATED statement should change to as the following. Then you can try other type covarience matrix structure.
PROC MIXED DATA = DATA2 METHOD=REML; CLASS Temp Dilut Time ID; MODEL y = Temp|Dilut|Time ; REPEATED Time / SUBJECT=ID*Temp*Dilut TYPE=CS; RUN; QUIT;
It's true that this code works. But do I really model the covariance correctly with this code? All my observations within a subject are dependent. Not nested within temperature or diluent.
Again for explanation: The 4 splits are used for the 4 different treatment combinations (2x2=4 combinations for temperature and diluent, both two levels). They were not averaged as they are single observations, each for one of the 4 treatments. The repetitions then come from different subjects (n=12). I have another datset in progress, hence the name if you wondered.
To clarify: I do not expect that the residuals for temperature 1 and 2 are independent of one another neither for diluent.
What do you think of this work-around with recoding the 4 treatments?
data data2;
set data2;
Temp_Dilut = catx(',', Temp, Dilut);
run;
PROC MIXED DATA = data2;
CLASS ID Temp Dilut Time Temp_Dilut;
MODEL y = Temp|Dilut|Time;
REPEATED Temp_Dilut Time / subject = ID TYPE=un@un r;
RUN;
The timepoints are not equally spaced. Measurements were taken after 24h, 72h and 144h.
Any opinion welcome!
If they were not averaged, then the experimental unite of Time is ID*Sample or ID*Temp*Dilut (here Sample and Temp*Dilut are one-to-one relationship). You need include Subject in your code as REPEATED Time / SUBJECT=ID*Sample TYPE=CS; To decide which covariance structure is more appropriate, you can use likelihood test.
sorry, there is a typo. it should be subject=ID*Sample. Your repeated measure were applied on each sample of a particular ID, so the subject of repeated measure is not ID, but ID*Sample.
If you try to correlate all observations from a single ID with TYPE=UN, ie
repeated temp*dilut*time / subject=id type=un;
then that will result in a 12x12 covariance matrix with 12*11/2=66 covariance parameters. With 144 total observations, that approach will probably not be successful, resulting in the Hessian matrix message you received.
The Kronecker product structures, like TYPE=UN@UN, can be helpful in a situation like this. However, they work best when the repeated measure is done in two dimensions, not three. You can modify the dimensionality here as others have suggested, but that might not give you the results you need.
If you use a repeated approach like
repeated time / subject=id*dilut*temp type=un;
then that correlates the three observations on each of those experimental units with a 3x3 unstructured covariance matrix. As you have noticed,that does not further correlate observations from the same ID. You can handle that by adding a RANDOM statement to the model
random int / subject=id;
That will add a common covariance to all 12 observations from from the same level of ID.
Try that combination of the RANDOM and REPEATED statement.
First: Thank you both, xuelin0820 and StatsMan, for helping me solve my problem. I am learning a lot here.
I tried both your solutions, for a start with type=CS or type=UN:
1) only the repeated (named CS1 and UN1 later)
REPEATED time / SUBJECT= ID*Sample TYPE=UN;
2) repeated + random (named CS2 and UN2 later)
REPEATED time / SUBJECT=ID*Temp*Dilut TYPE=UN;
RANDOM int / SUBJECT=ID;
In general the question: Is it correct that 'method = ML' should be chosen for the comparison of covariance structures (instead of REML)?
Then I get no warning and a table of fit-statistics as
It seems that both random and repeated supply a better fit, with a significant difference in likelihood in favor for UN2 instead of CS2:
For the same model definitions I would look at AR1 and ARH too. Then continue working with the model that offers the best fit in terms of likelihood. Do you agree with that workflow?
You are on the right track, with the exception that REML should be used when comparing covariance structures using the same fixed effect model.
Perfect, thank you for your advise!
Does that mean I can compare CS1 and UN2 or all other combinations as well with a likelihood ratio test because they have the same fixed effects (even though the random part may differ)? It would be considered "nested". The same as when I drop a fixed effect.
Only for non-nested (different fixed effects) I must stay in the same type of covariance and change over to AIC model selection.
I marked your answer as a solution since it helped a lot and answered my question. Gradually, there are only so many more.
Yes, you can cautiously compare the log-likelihoods under REML when only changing the covariance structure of the model. Just remember to keep the fixed effects the same in each model you are comparing.
I add my question here, because I have another similar experiment and appreciate your help. Let's say I am also interested in whether the individual animals (ids) differ. Can I simply put the ID to fixed and delete the random statement? The repeated measurements remain over the two other factors (dilut, temp). So instead of
PROC GLIMMIX DATA = dataset;
CLASS dilut temp time id;
MODEL y = dilut|temp|time/ DDFM=BETWITHIN;
RANDOM time / subject=id*dilut*temp type=un;
RANDOM int / SUBJECT=id solution;
RUN; QUIT;
I would use
PROC GLIMMIX DATA = dataset;
CLASS dilut temp time id;
MODEL y = dilut|temp|time id/ DDFM=BETWITHIN;
RANDOM time / subject=id*dilut*temp type=un;
RUN; QUIT;
Please do not get confused, by virtue of other settings and available plots I switched to GLIMMIX.
It really depends on how you want to treat the id levels. If ID is a sample from a population of potential ID's, then you might want to keep ID as a random effect. If the variance component for ID is significantly large (and you can use a COVTEST statement in GLIMMIX if you want a hypothesis test on the significance of the ID variance), then yes there is variability in the id levels. You can still use CONTRAST or ESTIMATE statements if you want to test for differences in the individual levels.
If the levels of ID you have in your data are the only levels of ID you are interested in, then you can make ID a fixed effect. You can use the LSMEANS statement with the PDIFF option to test for differences in pairs of levels, as well as the CONTRAST, ESTIMATE, and LSMESTIMATE statements.
In practice, you will see some researchers switch ID to a fixed effect even if the ID levels do not have all of your levels of interest. The theory really does not support that approach, however.
Hi StatsMan, I'm very interested in your answer. I know it is correct, but don't know how to use CONTRAST or ESTIMATE statements to test for differences in the individual levels when ID is a random factor. Can you explain more, or give a short example?
There is a general opinion that when considering if we can combine the data from different IDs, or say Years etc, we need to check if Year interacts with other fixed effects. If there is one interaction significant, then it is better to analyze the data by Year. Or when Year effect is significant, even the interact between Year and other factors are nor significant, we also should analyze the data by Year, not combine the data from multiple years. Of course, to test the interaction, Year has to be treated as a fixed effect in SAS, although as you said it is theoretically not appropriate. What do you think? Thanks!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.