Hi,
I am working with an observational study with two groups of patients. Patients in the two groups A and B are matched 1:1 on age and sex.
Patients A had an event at time 0, and were followed-up at time 1 and time 2.
Patients B did not have an event at time 0 (therefore no data collected) but had data collected at time 1 and 2.
Is it correct to use the following PROC MIXED to evaluate the differences and change over time of biomarkers with this design?
PROC MIXED DATA=long;
CLASS id group(ref='0') time (REF="1");
MODEL biomarker=time group time*group/ CL DDFM=KR2 VCIRY OUTPM=fitmain RESIDUAL;
REPEATED time / SUBJECT=id(group) TYPE=UN R RCORR;
RUN;
Thank you,
Ugh. What you are seeing is the difference between the marginal means (lsmeans) and the raw means. The lsmeans provide the best solution to the data as a whole, and in this case I think it is being driven by the values at enrollment in group A. I don't see a good way out of it when you analyze all of the data. One thing I forgot is the NOINT option on the model statement - it is handy for the means model so that the Solution Table gives the values (in the log space here) rather than the deviations from the reference group.
As far as GLIMMIX vs. log-transformed in MIXED, I think what you are doing in MIXED is the equivalent of declaring a log normal distribution in GLIMMIX, where the error term is integrated out as it is independent of the fixed effects. If you are curious as to the comparison (and you aren't under a time crunch), consider comparing the results. Then try running GLIMMIX with the default normal distribution, but with a log link. I'm off to run a quick comparison myself using one of the examples in the GLIMMIX documentation. I'll report back what I find.
SteveDenham
Before I answer whether the model is correct, I want to ask a question. Was the biomarker measured at enrollment for group B? If so, then the model is fine, and should give you what you want. If not, then I would follow the advice in SAS for Mixed Models, and fit what some call a means model. Comparisons of interest could be obtained from LSMESTIMATE statements in this case.
Note that if the biomarker measures at enrollment are clinically different, you may want to consider using that value as a covariate and do a repeated measures analysis of covariance.
SteveDenham.
Hi Steve,
Thank you for your reply.
The biomarker was not measured at enrollment for group B. Therefore:
I have 3 measurements of the biomarker in group A: at time 0, 1, and 2.
I have 2 measurements of the biomarker in group B: at time 1 and 2.
Could you clarify on what you mean with "fit a means model"? I looked up the SAS for Mixed Models but was unsure what you meant.
Best,
-Jules
The PROC MIXED code I would use would look something like:
PROC MIXED DATA=long;
CLASS id group time;
MODEL biomarker=time*group/ CL DDFM=KR2 VCIRY OUTPM=fitmain RESIDUAL;
REPEATED time / SUBJECT=id(group) TYPE=UN R RCORR;
LSMESTIMATE time*group 'Comparison at time 1' 0 1 0 -1 0,
'Comparison at time 2' 0 0 1 0 -1/ <check to see if there are any options you want to put in here. JOINT would give an F test about the significance of both comparisons>;
RUN;
A means model is appropriate for unbalanced data like you have. You fit only the interaction (no main effects), and construct contrasts/estimates/lsmestimates to get at the questions of interest. It is really possible I have misinterpreted what you want to do here. For instance, I might ask what information the biomarker in group A at enrollment provides-is there interest in comparing biomarker levels within group A to the enrollment value? Time 1 to Time 2? If there are, you may want to replace the LSMESTIMATE statement with the following LSMEANS statement:
LSMEANS time*group/diff;
Lastly, how confident are you that the errors are normally distributed? Many biomarkers, especially those in blood, have errors that are proportional to the observed value. If that is the case, you may want to consider switching to PROC GLIMMIX and specifying a log-normal distribution.
SteveDenham
Thank you once again for the quick reply.
I am interested in both differences between group A and B at time 1 and 2, and the change from time 1 to 2 in each group. I also want to evaluate the change from time 0 to 1 in group A.
I have log transformed my biomarker data, is this sufficient? Or would you still recommend using PROC GLIMMIX ?
I have added some LSMESTIMATE statements which give me the estimates I am interested in (this gives the same estimates as using the LSMEANS time*group/diff statement).
PROC MIXED DATA=long;
CLASS id group time;
MODEL biomarker=time*group/ CL DDFM=KR2 VCIRY OUTPM=fitmain RESIDUAL;
REPEATED time / SUBJECT=id(group) TYPE=UN R RCORR;
LSMESTIMATE time*group "B vs A at time 1" 0 1 0 -1 0 /cl;
LSMESTIMATE time*group "B vs A at time 2" 0 0 1 0 -1 /cl;
LSMESTIMATE time*group "A at time 1 vs time 0" 1 -1 0 0 0 /cl;
LSMESTIMATE mate time*group "A at time 2 vs time 1" 0 -1 1 0 0/cl;
LSMESTIMATE time*group "B at time 2 vs time 1" 0 0 0 -1 1 / cl;
RUN;
However, I am still puzzled. I see that for group A, the mean biomarker value decreases over time. However, in the PROC MIXED model, I get an increase in the biomarker estimates between time 1 and 2 in group A. Any idea what could cause this? I have tried to fit different covariance structures, and the AICC was smallest with the unstructured covariance.
Best,
-Jules
Ugh. What you are seeing is the difference between the marginal means (lsmeans) and the raw means. The lsmeans provide the best solution to the data as a whole, and in this case I think it is being driven by the values at enrollment in group A. I don't see a good way out of it when you analyze all of the data. One thing I forgot is the NOINT option on the model statement - it is handy for the means model so that the Solution Table gives the values (in the log space here) rather than the deviations from the reference group.
As far as GLIMMIX vs. log-transformed in MIXED, I think what you are doing in MIXED is the equivalent of declaring a log normal distribution in GLIMMIX, where the error term is integrated out as it is independent of the fixed effects. If you are curious as to the comparison (and you aren't under a time crunch), consider comparing the results. Then try running GLIMMIX with the default normal distribution, but with a log link. I'm off to run a quick comparison myself using one of the examples in the GLIMMIX documentation. I'll report back what I find.
SteveDenham
Comparison says GLIMMIX with a lognormal distribution = MIXED with data log transformed. Using a log link on the default normal gives a different fixed effect solution vector.
HTH
SteveDenham
I see, thank you for your help!
And thank you also for the NOINT tip, it is very neat.
I will check out the differences using GLIMMIX if I have time.
Best,
-Jules
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.