BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kc
Quartz | Level 8 kc
Quartz | Level 8

I am running a growth curve model for longitudinal data analysis using proc mixed on a dataset with 2 treatment groups. I have estimate statements to calculate individual means and the difference between means of the 2 treatment groups at 6 month follow-up. I am running in to an issue where the difference calculated by the estimate statement is not the same as the actual difference between the individual means. The sample code is below - 

 

trt has values A and B, fup_scale and baseline_scale are continuous variables, miss and dth are binary variables with values 1 and 0, month specifies follow-up month, sq_month is just a quadratic term for month.

 

I have assigned treatment specific weights to baseline scale, miss and dth variables in the estimate statements when calculating individual means and overall weights for miss and dth in the estimate statement for difference.

 

I am not able to figure out whats missing or incorrect in these statements - any help is appreciated!


proc mixed data=gcm method=ml covtest noitprint noclprint;

class trt pt;

model fup_scale = baseline_scale trt month sq_month miss dth
trt*month trt*sq_month trt*miss trt*dth
trt*miss*month trt*dth*month
miss*month miss*sq_month
dth*month dth*sq_month / solution;

random intercept month / sub=pt type=un G Gcorr;

*averaged estimates*;

estimate 'avg mean at 6m - trt A'
intercept 1 baseline_scale 57.66 trt 1 0 month 6 sq_month 36 miss 0.20 dth 0.30
trt*month 6 0 trt*sq_month 36 0 trt*miss 0.20 0 trt*dth 0.30 0
trt*miss*month 1.2 0 trt*dth*month 1.8 0
miss*month 1.2 miss*sq_month 7.2
dth*month 1.8 dth*sq_month 10.8 /cl;

estimate 'avg mean at 6m - trt B'
intercept 1 baseline_scale 57.19 trt 0 1 month 6 sq_month 36 miss 0.25 dth 0.35
trt*month 0 6 trt*sq_month 0 36 trt*miss 0 0.25 trt*dth 0 0.35
trt*miss*month 0 1.5 trt*dth*month 0 2.1
miss*month 1.5 miss*sq_month 9
dth*month 2.1 dth*sq_month 12.6 /cl;

estimate 'avg Difference at 6m: A - B'
trt 1 -1
trt*month 6 -6 trt*sq_month 36 -36
trt*miss 0.225 -0.225 trt*dth 0.325 -0.325
trt*miss*month 1.35 -1.35 trt*dth*month 1.95 -1.95/cl;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Just a quick look here...

 

I'm not clear on what you mean by "the actual difference between the individual means". By "individual means" do you mean "avg mean at 6m - trt A" and "avg mean at 6m - trt B"?

 

If so, then "avg Difference at 6m: A - B" will not reproduce the difference between the A estimate and the B estimate because the coefficients are different for most of the terms. For example, the coefficient for baseline_scale for A is 57.66; for B, 57.19. For the estimates to match, you would need to add

 

baseline_scale 0.47

 

where 0.47 = 57.66 - 57.19 to the avg Diff estimate statement. And so on for all the other terms that don't cancel out (i.e., that have a difference that is nonzero).

 

Generally speaking, the estimate for a difference is the difference between the estimates, term by term.

 

As an unsolicited aside: this is a really busy multiple regression with lots of interactions. I'd consider centering (perhaps even standardizing) the continuous covariates. It's also multilevel, so you perhaps could be thinking about random coefficients models and other details about regressions in mixed models, assuming you have a sufficiently large sample to support more estimation.

 

I hope this helps.

 

 

View solution in original post

3 REPLIES 3
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Just a quick look here...

 

I'm not clear on what you mean by "the actual difference between the individual means". By "individual means" do you mean "avg mean at 6m - trt A" and "avg mean at 6m - trt B"?

 

If so, then "avg Difference at 6m: A - B" will not reproduce the difference between the A estimate and the B estimate because the coefficients are different for most of the terms. For example, the coefficient for baseline_scale for A is 57.66; for B, 57.19. For the estimates to match, you would need to add

 

baseline_scale 0.47

 

where 0.47 = 57.66 - 57.19 to the avg Diff estimate statement. And so on for all the other terms that don't cancel out (i.e., that have a difference that is nonzero).

 

Generally speaking, the estimate for a difference is the difference between the estimates, term by term.

 

As an unsolicited aside: this is a really busy multiple regression with lots of interactions. I'd consider centering (perhaps even standardizing) the continuous covariates. It's also multilevel, so you perhaps could be thinking about random coefficients models and other details about regressions in mixed models, assuming you have a sufficiently large sample to support more estimation.

 

I hope this helps.

 

 

SteveDenham
Jade | Level 19

I am curious how @kc  got different values by treatment for baseline_scale, miss and dth without including an interaction terms in the model. I have to assume that the univariate calculations do not represent the marginal population estimates that the model fit would come up with if the interactions were included.  Further, it appears to me that the current ESTIMATE statements are attempts to reproduce the raw means, which sort of nullifies the point of modeling treatment differences, accounting for continuous variables by setting them equal in the two groups .However, there are a lot of continuous and continuous by continuous terms in this model, and adding in categorical by continuous terms (separate slopes) seems like it might result in an overfit or a dimensionality problem, depending on the number of observations.  

 

I would also consider using an EFFECT statement to get the polynomial terms in month, and use the STANDARDIZE method=moments to center and scale the variable month.

 

SteveDenham 

SteveDenham
Jade | Level 19

These estimate statements could be replaced by LSMESTIMATE statements with the AT option.  For instance,

 

lsmestimate trt 'avg lsmean at 6m - trt A' 1 0/ cl AT
( baseline_scale month sq_month miss dth) = ( 57.66   6  36  0.20 0.30)
;
lsmestimate trt 'avg lsmean at 6m - trt B' 0 1/ cl AT
( baseline_scale month sq_month miss dth) = ( 57.66   6  36  0.20 0.30)
;
lsmestimate trt 'difference between lsmeans at 6m - trt A' 1 -1/ cl AT
( baseline_scale month sq_month miss dth) = ( 57.66   6  36  0.20 0.30)
;

All of the covariate values are accommodated in the LSMEANs, such that the LSMESTIMATE statement applies them across the estimable function.

 

SteveDenham

 

 

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1985 views
  • 1 like
  • 3 in conversation