Hello,
I am trying to determine if animal's SNP (genotype) and diet affect quantity of a compound. There were 25 animals in the experiment measured in the course of 3 weeks. After the first measurement, half of the animals were fed with different diet.
Dataset looks like this:
ID week SNP diet compound
1 1 AA a 6.6
1 2 AA a 6.1
1 3 AA a 5.5
2 1 GG b 7.3
2 2 GG b 7.1
2 3 GG b 7.0
...
Q1 - How does the equivalent to lmer look like?
Initally, I used lmer function in R for a model with random slope + random intercept (because I saw very similar example in a lecture and it was pretty straightforward). It is as follows:
model <- lmer(compound ~ diet + snp + diet*snp + week + (week|id), data)
So I wanted to see how the notation would look like in SAS using PROC MIXED. However, despite reading many different articles I am quite lost. I tried this, but the result is way different from lmer:
proc mixed data=have;
class id diet snp week;
model y = diet snp diet*snp week;
random id;
repeated week / subject=id;
run;
So what would be the correct form?
Q2 - Is there a better option?
Feel free to suggest a different model if you think this one is not good enough.
Note
I also tried random intercept only model: lmer(compound ~ diet + snp + diet*snp + week + (1|id), data), which is equivalent to
proc mixed data=have;
class id diet snp week;
model y = diet snp diet*snp week;
repeated week / subject=id;
run;
@SteveDenham suggests that you replace your REPEATED statement with
random intercept / subject=id;
That RANDOM statement fits the equivalent model to what you have with your current REPEATED statement. The RANDOM statement you currently have
random week / subject=id type=cs;
fits a *much* more complicated covariance structure. In general, you cannot interchange the statement names RANDOM and REPEATED and expect to get the same results. The RANDOM statement sets up g-side covariances while the REPEATED statement sets up r-side. The structure imposed by the REPEATED statement is usually easier to understand.
Two things you might try. First, lmer uses maximum likelihood as a default, while MIXED uses restricted maximum likelihood (REML). So you might add method=ml to the PROC MIXED statement. The other is to change the random effect to specifically fit random intercepts:
random intercept/subject=id
That is as much as I really know about the two platforms.
I do have a couple of questions that could affect your analysis. Is there any reason to suspect autocorrelation of the residuals? If so, a type= option to model that might be in order. The second has to do with specification of repeated effects in lmer. It looks to me as if you are modeling week as a random effect, rather than modeling correlation in the residuals. The latter is what the REPEATED statement in PROC MIXED does. I could certainly be wrong on this, but if you want to check this, then change the REPEATED statement to a RANDOM statement, and see if the results are closer to the lmer results.
SteveDenham
Hi Steve, thanks a lot for the insight. I clearly need to dig deeper, because it will not be as easy as I thought. Covarance structure seems to be an issue here.
Also, I made a graph for each animal, where X=week and Y=compound, and it was all over the place so it was suggested to me not to estimate the random slopes and simply consider all the effects as fixed with interactions as follows:
proc glm data=have;
class diet snp;
model compound1-compound3 = diet | snp / nouni ;
repeated week 3;
run; quit;
or equivalently:
proc mixed data=have method=reml;
class id diet snp week ;
model compound = diet | snp | week;
repeated week / subject=id type=cs;
run;
However, I still need to look into it.
As for your last idea, I get this error: "Convergence criteria met but final hessian is not positive definite."
The REPEATED statement in your model does correlate the residuals for each animal using a common correlation. Fitting the 3-way interaction of your fixed effects might be more than your data can support. Try dropping the vertical bars on your MODEL statement and fit only the 2-way interactions of interest to you. A full(er) description of your data might help diagnose issues with the Hessian. How many levels for SNP? There are 2 diets, is that correct? If possible, include the complete data since it is a relatively small data set.
(There are 2 levels of diet and 3 levels of SNP.)
However, I think one of us misunderstands. It is probably me, but just to make sure, I should emphasize that these two procedures seem to be working fine:
proc glm data=have;
class diet snp;
model compound1-compound3 = diet | snp / nouni ;
repeated week 3;
run; quit;
proc mixed data=have;
class id diet snp week ;
model compound = diet | snp | week;
repeated week / subject=id type=cs;
run;
I found similar approach in two guides so I wanted to check if this is okay as well (even though I am aware that I sort of drifted away from the original comparsion with lmer).
"Issues with Hessian" came when I tried what Steve suggested: "...change the REPEATED statement to a RANDOM statement..."
proc mixed data=have;
class id diet snp week ;
model compound = diet | snp | week;
random week / subject=id type=cs;
run;
(Even without so many interaction to problem remains.)
But I think it is not a big issue because I feel am past the original question. Now I would be happy to just do a reasonable analysis in SAS.
(Also sorry that I cannot share the complete data since they are not mine.)
@SteveDenham suggests that you replace your REPEATED statement with
random intercept / subject=id;
That RANDOM statement fits the equivalent model to what you have with your current REPEATED statement. The RANDOM statement you currently have
random week / subject=id type=cs;
fits a *much* more complicated covariance structure. In general, you cannot interchange the statement names RANDOM and REPEATED and expect to get the same results. The RANDOM statement sets up g-side covariances while the REPEATED statement sets up r-side. The structure imposed by the REPEATED statement is usually easier to understand.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.