Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mitrakos
Obsidian | Level 7

Hi Everyone,

 

I have some data and we want to look at the relationship between the variable bp over two different protein variables over three time points. I've included a sample dataset below. At face value it sounds like a simple mixed models but I feel like I am overthinking it and am not sure if what I am doing is the most apt way to look at the data. I've also included my mixed models code below. Do note that the main interest lies in the relationship between bp and protein1, and  bp and protein2, not protein1 and 2 together. With that in mind I constructed two models. 

 

Sample dataset:

data sample;
input id day bp protein1 protein2;
datalines;
14 0 159 -0.47 -0.59
14 5 120 -0.25 -0.12
14 11 126 0.72 0.71
3 0 136 1.10 -0.96
3 5 111 -0.05 1.41
3 11 110 -1.04 -0.45
11 0 135 2.27 -0.16
11 5 110 -1.11 0.38
11 11 124 0.50 -0.53
6 0 164 0.08 1.22
6 5 120 -1.17 -0.22
6 11 124 0.55 1.94
;
run;

Mixed models code: 

proc mixed data=import plots=residualpanel;
class day id;
model bp = day protein1 / solution;
repeated day / subject = id type=cs;
run;

Code is the same for both models with just protein1 and 2 swapped. 

 

Thank you so much for reading!!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I would include DAY in the MODEL statement and I would also include the interaction between DAY and PROTEIN1 to determine if the relationship changes over time.  Something like this:

 

proc mixed data=work.steekproef; 
class id day;
model bp=/*basebp*/ day protein1 protein1*day /solution noint ddfm=kr2;
*repeated day / type=un subject=id /* r rcorr */;
 repeated day / type=ar(1) subject=id ;
*repeated day / type=toep  subject=id ;
ods output FitStatistics=FitAR1(rename=(value=AR1)) 
           FitStatistics=FitAR1p 
           Dimensions=ParmAR1(rename=(value=NumAR1));
run;

I would consider your model to be three parallel lines, relating bp to protein1 with the various day values serving as intercepts.  However if the relationship between bp and protein1 changes over time such that the lines are not parallel, the code I put above will capture that.  That also explains why I added the NOINT option to the model statement, so the solution produces the intercept for each day, rather than being the intercept for the last day and the other values the deviations from that intercept. Just a bit easier to interpret, in my opinion.

 

Now on to considering what @PaigeMiller suggested. Do you want to see if the relationship over time between bp and protein1 is the same as the relationship over time between bp and protein2? Fitting two separate models probably is not the optimum way to address that question. One of the things to consider here is whether the subjects were exposed to both proteins on the same day. Judging from the layout of the data, I would say that this is the case, and that brings up the question of whether the two proteins are acting independently, or it there is some degree of interaction that should be accounted for.

 

Answers to the last question will dictate any additional changes to the MODEL statement.

 

SteveDenham

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

What model term effects would you like to test? Are you planning to compare the two models? Is there any interest in knowing if the effect of protein1 on BP is the same/different/greater than/less than the effect of protein2 on BP?

--
Paige Miller
mitrakos
Obsidian | Level 7
Hi Paige, thanks for the quick reply. As far as I know, there is no interest in comparing the two models. It's just looking at if either protein exhibit a relationship with BP over the three time points.
PaigeMiller
Diamond | Level 26

I would probably put both protein1 and protein2 in the same model. These two variables are relatively un-correlated, but if they do have an effect on BP it is accounted for by the model; the problem if you use two models each with one protein is that the effect of the other protein appears to be noise.

--
Paige Miller
sbxkoenk
SAS Super FREQ

You can start here (protein1):

data steekproef;
input id day bp protein1 protein2;
datalines;
14  0 159 -0.47 -0.59
14  5 120 -0.25 -0.12
14 11 126  0.72  0.71
3   0 136  1.10 -0.96
3   5 111 -0.05  1.41
3  11 110 -1.04 -0.45
11  0 135  2.27 -0.16
11  5 110 -1.11  0.38
11 11 124  0.50 -0.53
6   0 164  0.08  1.22
6   5 120 -1.17 -0.22
6  11 124  0.55  1.94
;
run;

proc sgplot data=work.steekproef; 
 vline day / group=id stat=mean response=protein1; 
title 'bp vs. day by ID'; 
run; 
title;

/* Mixed Model Analysis of Repeated Measures with Covariate (ANCOVA)        */
/* Use Information Criteria to choose most appropriate covariance structure */
/* UN versus AR(1) versus Toeplitz                                          */
/* Information Criteria : AIC , AICC , BIC                                  */
proc mixed data=work.steekproef; 
class id day;
model bp=/*basebp*/ day protein1 / ddfm=kr2;
*repeated day / type=un subject=id /* r rcorr */;
 repeated day / type=ar(1) subject=id ;
*repeated day / type=toep  subject=id ;
ods output FitStatistics=FitAR1(rename=(value=AR1)) 
           FitStatistics=FitAR1p 
           Dimensions=ParmAR1(rename=(value=NumAR1));
run; QUIT;
/* end of program */

BR, Koen

Ksharp
Super User
I think you could take DAY out of MODEL.
Since in MODEL, only list fixed effect, while DAY is R side random effect not fixed.
But @lvm or @SteveDenham know more things about MIXED model.

proc mixed data=sample plots=residualpanel covtest;
class day id;
model bp = protein1 / solution;
repeated day / subject = id type=ar(1);
run;
sbxkoenk
SAS Super FREQ

@Ksharp could be right.
I would have to try whether it makes a difference to include 'day' in MODEL statement or not.

 

But, in general, the below is indeed true:

The MIXED procedure uses:

  • the MODEL statement to model the fixed effects (or the mean model),
  • the RANDOM statement to model the random effects (the covariance model), and
  • the REPEATED statement to model the non-default error structures (the covariance model).

The variance-covariance matrix for random effects is denoted as G, the variance-covariance matrix for the residuals is denoted as R, and the covariance-covariance matrix for the observed data is denoted as V. It can be shown that V=ZGZ′ + R.

 

BR, Koen

SteveDenham
Jade | Level 19

I would include DAY in the MODEL statement and I would also include the interaction between DAY and PROTEIN1 to determine if the relationship changes over time.  Something like this:

 

proc mixed data=work.steekproef; 
class id day;
model bp=/*basebp*/ day protein1 protein1*day /solution noint ddfm=kr2;
*repeated day / type=un subject=id /* r rcorr */;
 repeated day / type=ar(1) subject=id ;
*repeated day / type=toep  subject=id ;
ods output FitStatistics=FitAR1(rename=(value=AR1)) 
           FitStatistics=FitAR1p 
           Dimensions=ParmAR1(rename=(value=NumAR1));
run;

I would consider your model to be three parallel lines, relating bp to protein1 with the various day values serving as intercepts.  However if the relationship between bp and protein1 changes over time such that the lines are not parallel, the code I put above will capture that.  That also explains why I added the NOINT option to the model statement, so the solution produces the intercept for each day, rather than being the intercept for the last day and the other values the deviations from that intercept. Just a bit easier to interpret, in my opinion.

 

Now on to considering what @PaigeMiller suggested. Do you want to see if the relationship over time between bp and protein1 is the same as the relationship over time between bp and protein2? Fitting two separate models probably is not the optimum way to address that question. One of the things to consider here is whether the subjects were exposed to both proteins on the same day. Judging from the layout of the data, I would say that this is the case, and that brings up the question of whether the two proteins are acting independently, or it there is some degree of interaction that should be accounted for.

 

Answers to the last question will dictate any additional changes to the MODEL statement.

 

SteveDenham

PaigeMiller
Diamond | Level 26

@SteveDenham 

I'm not sure that was my reasoning to include both protein1 and protein2 in the model, but you still have a valid point. My reasoning was that if you just create a model with protein1, then the effect of protein2 is not in the model but is included in the error term and in all statistical testing, which makes it harder to find statistical significance. Similar is protein2 is in the model but not protein1. Thus to avoid this problem of inflated error term, protein1 and protein2 ought to both be in any model fit.

--
Paige Miller
SteveDenham
Jade | Level 19

@PaigeMiller 

That is an excellent point. So we now have two pretty good reasons to fit all of the data, rather than parts of it, and trying to make sense of the parts.

 

SteveDenham

mitrakos
Obsidian | Level 7

Hi Everyone,

 

This has been a really great discussion to follow and I can't thank you enough for it. 

 

I've modified my proc mixed code to include both proteins, but also include the interaction term for both proteins and day. My justification is that these interaction terms will give me an even better idea of how these two proteins relate to BP over time. What do you guys think? 

 

ods output lsmeans = day_means;
ods output SolutionF = SolutionF;
proc mixed data=import plots=residualpanel;
class day id;
model bp = day protein1 protein2 day*protein1 day*protein2/ solution;
repeated Day / subject = ID type=cs;
lsmeans day;
run;
Ksharp
Super User
SteveDenham ,
I know we should start a FULL model at the first place (include any independent variable and interact effect ) .
Otherwise ,we could miss or ignore a very important independent/fixed variable Like DAY/VISIT ?

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 3406 views
  • 6 likes
  • 5 in conversation