BookmarkSubscribeRSS Feed
dmw
Calcite | Level 5 dmw
Calcite | Level 5

Dear All,

 

Clinical trials designed with multiple doses and a placebo group sometimes want to have an estimate of the combined dose group effect compared against placebo at the specified endpoint (eg, Week 8). Essentially, I am wondering if it is better to pool the dose groups prior to running the model or if the dose groups should be pooled in the contrast statement itself. I have provided example code below. I cannot find documentation regarding what the difference is between the two methods and when it is appropriate to use either method. I am working in SAS v9.4.

 

data test;
	call streaminit(33445);
	do id=1 to 20;
	rid=rand('normal');
	trt=ceil(rand('uniform')*3);
	if trt in (2,3) then trt2=2;
	else trt2=trt;
	do time=1 to 2;
	y=trt + trt*time + rand('normal') + rid;
	output;
	end; 
	end;
run;

proc mixed data=test;
	class id trt time;
	model y=trt time trt*time / e;
	repeated time / subject=id(trt) type=cs;
	contrast 'placebo vs active at timepoint 2' trt -1 .5 .5 trt*time 0 -1 0 .5 0 .5;  
	estimate 'placebo vs active at timepoint 2' trt -1 .5 .5 trt*time 0 -1 0 .5 0 .5; 
	lsmeans trt*time / diff; 
run;

proc mixed data=test;
	class id trt2 time;
	model y=trt2 time trt2*time;
	repeated time / subject=id(trt2) type=cs;
	lsmeans trt2*time / diff; 
	estimate 'placebo vs active at timepoint 2' trt2 -1 1 trt2*time 0 -1 0 1; 
run;

 

Here are the results using trt in model:

                                                                        Standard
    Label                                             Estimate       Error      DF    t Value    Pr > |t|

    placebo vs active at timepoint 2      5.7302      0.7368      17       7.78      <.0001

 

Here are the results using trt2 in model:

                                                                         Standard

 Label                                                 Estimate       Error      DF    t Value    Pr > |t|

    placebo vs active at timepoint 2      5.9023      1.2494      18       4.72      0.0002

 

Many thanks in advance!!

6 REPLIES 6
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

A well-posed question 🙂

 

First, create a balanced data set so that you aren't trying to juggle the impacts of unbalanced data while you sort out syntax.

data newtest;
	call streaminit(33445);
	do id=1 to 10;
	   rid=rand('normal'); *random effect for subject=id;
	   do trt= 1 to 3;
	       if trt in (2,3) then trt2=2;
	       else trt2=trt;
	       do time=1 to 2;
	           y=trt + trt*time + rand('normal') + rid;
	           output;
	           end; 
	       end;
        end;
run;
proc tabulate data=newtest;
class trt trt2;
table trt, trt2;
run;

Then run your two models. Note that the estimates of the difference now match, but SEs and DFs do not.

 

The fundamental difference in the two models lies in the REPEATED statement. The first model using

	repeated time / subject=id(trt) type=cs;

identifies 30 subjects (10 IDs for each of 3 TRTs). But the REPEATED statement in the second model using

	repeated time / subject=id(trt2) type=cs;

identifies only 20 subjects (10 IDs for each of 2 TRT2s). Consequently SEs and DFs differ.

 

If my experiment randomly assigned 3 treatments to 10 subjects per treatment so that I actually had 30 subjects in total, I would use the first model rather than the second because the first model preserves the experimental design; the second makes up a new one.

dmw
Calcite | Level 5 dmw
Calcite | Level 5

Hi All,

 

Thank you for the quick response.

 

I don't think I previously included the hypothesis of interest: Is there is a significant difference between combined groups 2 and 3 versus 1?

 

I realized that the IDs were identical within the treatment groups so when I combined the two treatment groups it assumed that certain subjects had multiple assessments at each time point (ie, that there were only 10 subjects in the newly created treatment group and therefore only 20 subjects total). I have updated my code to make the subject IDs unique. This experimental design assumes 10 subjects are randomized to 3 treatment groups (ie, 30 subjects total). If I am interested in comparing two pooled groups versus one group I am wondering how the interpretation between the two following models differs? The LSMD estimate is the same, but the SEs differ. I am wondering how to understand the difference between these two models. 

 

My gut is to use the estimate statement because that follows the experimental design, but I am wondering if there is another reason beyond that or if I should use the pooled treatment groups variable instead?

 

data newtest;
	call streaminit(33445);
	do id=1 to 10;
	   rid=rand('normal'); *random effect for subject=id;
	   do trt= 1 to 3; 
	       if trt in (2,3) then trt2=2;
	       else trt2=trt;
	       do time=1 to 2;
	           y=trt + trt*time + rand('normal') + rid;
	           output;
	           end; 
	       end;
        end;
run;
data newtest;
   set newtest;
   id = id * trt + (11*trt);
run;

proc mixed data=newtest method=reml;
         class id trt time;
         model y = trt time  trt*time/ s ddfm=kr covb;
         repeated time/ type=un subject=id(trt);
         lsmeans trt*time / diff;
         estimate 'test1' trt 1 -0.5 -0.5
                           trt * time 0 1
                                       0 -.5
                                      0 -0.5 /e;
run;

proc mixed data=newtest method=reml;
         class id trt2 time;
         model y = trt2 time  trt2*time/ s ddfm=kr covb;
         repeated time/ type=un subject=id(trt2);
      lsmeans trt2*time / diff e;
run;

The results I get follow:

The first model 

                         Estimates

                       Standard
Label Estimate Error DF t Value Pr > |t|

test1 -4.7338 0.5992 27 -7.90 <.0001

 

The second model

                  Differences of Least Squares Means

                                                                 Standard
Effect TRT2 TIME _TRT2 _TIME Estimate Error DF t Value Pr > |t|

TRT2*TIME 1 1 1 2 -1.0561 0.4529 28 -2.33 0.0271
TRT2*TIME 1 1 2 1 -3.7173 0.7552 28 -4.92 <.0001
TRT2*TIME 1 1 2 2 -5.7899 0.7797 35 -7.43 <.0001
TRT2*TIME 1 2 2 1 -2.6612 0.8034 34 -3.31 0.0022
TRT2*TIME 1 2 2 2 -4.7338 0.8265 28 -5.73 <.0001
TRT2*TIME 2 1 2 2 -2.0725 0.3203 28 -6.47 <.0001

 

 

Another part of the question is also what if you want to perform pairwise comparisons as an exploratory analysis. Would you want to use contrast statements to obtain those LSMDs or would you run the model using only the subjects in the treatment groups of interest? In this case again, one gets the same LSMD estimate but the SE and DF are different.

 

proc mixed data=newtest method=reml;
         class id trt time;
         model y = trt time  trt*time/ s ddfm=kr ;
         repeated time/ type=un subject=id(trt);
         lsmeans trt*time / diff;
         estimate 'test2' trt 0 1 -1
                           trt * time 0 0
                                         0 1
                                         0 -1 /e;
run;

proc mixed data=newtest method=reml;
        where trt in (2 3);
         class id trt time;
         model y = trt time  trt*time/ s ddfm=kr ;
         repeated time/ type=un subject=id(trt);
         lsmeans trt*time / diff;
run;

The output from the estimate statement (model 1):

                                 Estimates

                      Standard
Label Estimate Error DF t Value Pr > |t|
test2 -3.5463 0.6919 27 -5.13 <.0001

 

The output from the subset model (model 2):
            Differences of Least Squares Means

                                                             Standard
Effect TRT TIME _TRT _TIME Estimate Error DF t Value Pr > |t|

TRT*TIME 2 1 2 2 -1.5477 0.3855 18 -4.02 0.0008
TRT*TIME 2 1 3 1 -2.4967 0.6730 18 -3.71 0.0016
TRT*TIME 2 1 3 2 -5.0940 0.7234 23.5 -7.04 <.0001
TRT*TIME 2 2 3 1 -0.9489 0.7234 23.5 -1.31 0.2023
TRT*TIME 2 2 3 2 -3.5463 0.7705 18 -4.60 0.0002
TRT*TIME 3 1 3 2 -2.5973 0.3855 18 -6.74 <.0001

 

I greatly appreciate everyone's insight.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

1. Use the ESTIMATE statement. The LSMESTIMATE statement is a great feature that makes writing contrasts even easier; check it out in the documentation or see 

CONTRAST and ESTIMATE Statements Made Easy: The LSMESTIMATE Statement

 

 

2. Use ESTIMATE, CONTRAST, or LSMESTIMATE.

You could also take advantage of the SLICE option on the LSMEANS statement which estimates simple effects and saves you the effort of writing contrasts. The GLIMMIX procedure offers the SLICEDIFF option; check it out.

dmw
Calcite | Level 5 dmw
Calcite | Level 5

Thank you for the quick response.

 

In addition to the ways the means can be estimated I am wondering what the interpretative difference between a model that has a three level treatment group and creating a contrast that 'averages the cell means' and a model that has a two level treatment group. I understand that the point estimates are the same, but the SEs and DFs are different so I am trying to understand the difference between these two methods. Which model is best posed to answer my question of "Is there a difference in group 2 and 3 versus 1?" 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Your experimental design involved subjects assigned to three treatment groups, not subjects assigned to two treatment groups. The experimental design determines the statistical model. Post-hoc redefinition of experimental treatments is hardly ever (even never?) a good idea.

 

In my opinion, the appropriate model specifies three treatment groups with a contrast to compare the mean of groups 2 and 3 to the mean of group 1.

dmw
Calcite | Level 5 dmw
Calcite | Level 5

Thank you for your response. This was my thinking as well.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2476 views
  • 0 likes
  • 2 in conversation