- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
I have some data and we want to look at the relationship between the variable bp over two different protein variables over three time points. I've included a sample dataset below. At face value it sounds like a simple mixed models but I feel like I am overthinking it and am not sure if what I am doing is the most apt way to look at the data. I've also included my mixed models code below. Do note that the main interest lies in the relationship between bp and protein1, and bp and protein2, not protein1 and 2 together. With that in mind I constructed two models.
Sample dataset:
data sample;
input id day bp protein1 protein2;
datalines;
14 0 159 -0.47 -0.59
14 5 120 -0.25 -0.12
14 11 126 0.72 0.71
3 0 136 1.10 -0.96
3 5 111 -0.05 1.41
3 11 110 -1.04 -0.45
11 0 135 2.27 -0.16
11 5 110 -1.11 0.38
11 11 124 0.50 -0.53
6 0 164 0.08 1.22
6 5 120 -1.17 -0.22
6 11 124 0.55 1.94
;
run;
Mixed models code:
proc mixed data=import plots=residualpanel;
class day id;
model bp = day protein1 / solution;
repeated day / subject = id type=cs;
run;
Code is the same for both models with just protein1 and 2 swapped.
Thank you so much for reading!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would include DAY in the MODEL statement and I would also include the interaction between DAY and PROTEIN1 to determine if the relationship changes over time. Something like this:
proc mixed data=work.steekproef;
class id day;
model bp=/*basebp*/ day protein1 protein1*day /solution noint ddfm=kr2;
*repeated day / type=un subject=id /* r rcorr */;
repeated day / type=ar(1) subject=id ;
*repeated day / type=toep subject=id ;
ods output FitStatistics=FitAR1(rename=(value=AR1))
FitStatistics=FitAR1p
Dimensions=ParmAR1(rename=(value=NumAR1));
run;
I would consider your model to be three parallel lines, relating bp to protein1 with the various day values serving as intercepts. However if the relationship between bp and protein1 changes over time such that the lines are not parallel, the code I put above will capture that. That also explains why I added the NOINT option to the model statement, so the solution produces the intercept for each day, rather than being the intercept for the last day and the other values the deviations from that intercept. Just a bit easier to interpret, in my opinion.
Now on to considering what @PaigeMiller suggested. Do you want to see if the relationship over time between bp and protein1 is the same as the relationship over time between bp and protein2? Fitting two separate models probably is not the optimum way to address that question. One of the things to consider here is whether the subjects were exposed to both proteins on the same day. Judging from the layout of the data, I would say that this is the case, and that brings up the question of whether the two proteins are acting independently, or it there is some degree of interaction that should be accounted for.
Answers to the last question will dictate any additional changes to the MODEL statement.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What model term effects would you like to test? Are you planning to compare the two models? Is there any interest in knowing if the effect of protein1 on BP is the same/different/greater than/less than the effect of protein2 on BP?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would probably put both protein1 and protein2 in the same model. These two variables are relatively un-correlated, but if they do have an effect on BP it is accounted for by the model; the problem if you use two models each with one protein is that the effect of the other protein appears to be noise.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can start here (protein1):
data steekproef;
input id day bp protein1 protein2;
datalines;
14 0 159 -0.47 -0.59
14 5 120 -0.25 -0.12
14 11 126 0.72 0.71
3 0 136 1.10 -0.96
3 5 111 -0.05 1.41
3 11 110 -1.04 -0.45
11 0 135 2.27 -0.16
11 5 110 -1.11 0.38
11 11 124 0.50 -0.53
6 0 164 0.08 1.22
6 5 120 -1.17 -0.22
6 11 124 0.55 1.94
;
run;
proc sgplot data=work.steekproef;
vline day / group=id stat=mean response=protein1;
title 'bp vs. day by ID';
run;
title;
/* Mixed Model Analysis of Repeated Measures with Covariate (ANCOVA) */
/* Use Information Criteria to choose most appropriate covariance structure */
/* UN versus AR(1) versus Toeplitz */
/* Information Criteria : AIC , AICC , BIC */
proc mixed data=work.steekproef;
class id day;
model bp=/*basebp*/ day protein1 / ddfm=kr2;
*repeated day / type=un subject=id /* r rcorr */;
repeated day / type=ar(1) subject=id ;
*repeated day / type=toep subject=id ;
ods output FitStatistics=FitAR1(rename=(value=AR1))
FitStatistics=FitAR1p
Dimensions=ParmAR1(rename=(value=NumAR1));
run; QUIT;
/* end of program */
BR, Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since in MODEL, only list fixed effect, while DAY is R side random effect not fixed.
But @lvm or @SteveDenham know more things about MIXED model.
proc mixed data=sample plots=residualpanel covtest;
class day id;
model bp = protein1 / solution;
repeated day / subject = id type=ar(1);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp could be right.
I would have to try whether it makes a difference to include 'day' in MODEL statement or not.
But, in general, the below is indeed true:
The MIXED procedure uses:
- the MODEL statement to model the fixed effects (or the mean model),
- the RANDOM statement to model the random effects (the covariance model), and
- the REPEATED statement to model the non-default error structures (the covariance model).
The variance-covariance matrix for random effects is denoted as G, the variance-covariance matrix for the residuals is denoted as R, and the covariance-covariance matrix for the observed data is denoted as V. It can be shown that V=ZGZ′ + R.
BR, Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would include DAY in the MODEL statement and I would also include the interaction between DAY and PROTEIN1 to determine if the relationship changes over time. Something like this:
proc mixed data=work.steekproef;
class id day;
model bp=/*basebp*/ day protein1 protein1*day /solution noint ddfm=kr2;
*repeated day / type=un subject=id /* r rcorr */;
repeated day / type=ar(1) subject=id ;
*repeated day / type=toep subject=id ;
ods output FitStatistics=FitAR1(rename=(value=AR1))
FitStatistics=FitAR1p
Dimensions=ParmAR1(rename=(value=NumAR1));
run;
I would consider your model to be three parallel lines, relating bp to protein1 with the various day values serving as intercepts. However if the relationship between bp and protein1 changes over time such that the lines are not parallel, the code I put above will capture that. That also explains why I added the NOINT option to the model statement, so the solution produces the intercept for each day, rather than being the intercept for the last day and the other values the deviations from that intercept. Just a bit easier to interpret, in my opinion.
Now on to considering what @PaigeMiller suggested. Do you want to see if the relationship over time between bp and protein1 is the same as the relationship over time between bp and protein2? Fitting two separate models probably is not the optimum way to address that question. One of the things to consider here is whether the subjects were exposed to both proteins on the same day. Judging from the layout of the data, I would say that this is the case, and that brings up the question of whether the two proteins are acting independently, or it there is some degree of interaction that should be accounted for.
Answers to the last question will dictate any additional changes to the MODEL statement.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure that was my reasoning to include both protein1 and protein2 in the model, but you still have a valid point. My reasoning was that if you just create a model with protein1, then the effect of protein2 is not in the model but is included in the error term and in all statistical testing, which makes it harder to find statistical significance. Similar is protein2 is in the model but not protein1. Thus to avoid this problem of inflated error term, protein1 and protein2 ought to both be in any model fit.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is an excellent point. So we now have two pretty good reasons to fit all of the data, rather than parts of it, and trying to make sense of the parts.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
This has been a really great discussion to follow and I can't thank you enough for it.
I've modified my proc mixed code to include both proteins, but also include the interaction term for both proteins and day. My justification is that these interaction terms will give me an even better idea of how these two proteins relate to BP over time. What do you guys think?
ods output lsmeans = day_means;
ods output SolutionF = SolutionF;
proc mixed data=import plots=residualpanel;
class day id;
model bp = day protein1 protein2 day*protein1 day*protein2/ solution;
repeated Day / subject = ID type=cs;
lsmeans day;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
https://blogs.sas.com/content/iml/2019/12/05/longitudinal-data-mixed-model.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I know we should start a FULL model at the first place (include any independent variable and interact effect ) .
Otherwise ,we could miss or ignore a very important independent/fixed variable Like DAY/VISIT ?