BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JG_PhM
Calcite | Level 5

Hi everyone, 

 

I am trying to get hazard ratios and wald chi-square tests for the comparison of each experimental arm vs. control. We have two experimental treatments and one control group with randomization 1:1:1;

 

i_adtte is a time-to event ADAM dataset

TRTPN is numeric "treatment arm" 1='A' 2='B' 3='Control' (Ref)

AVAL is the time to event. 

CNSR controls censoring

There are no missing values for any of the cases in the three arms.

 

I wonder why I get different for HRs, confidence limits and chi-square tests when I apply two contrast statements on the three-arm data (A) or when I select the data for each comparison and then apply a single contrast (B1) / (B2). 

 

How do the data from the arm not included  in a contrast (For example arm B) intervene in the calculation of the main effect of the other arm (For example arm A) against the control group? What should be the correct approach (separate or joint contrasts)?

 

Many Thanks in advance.

JG

 

(A)

  proc phreg data = i_adtte;
           class TRTPN (ref=LAST)/ param = ref;
              model AVAL*CNSR(1) = TRTPN;
           hazardratio TRTPN ;
           contrast 'A vs. Control'  TRTPN  1 0;
           contrast 'B vs. Control'  TRTPN  0 1;
      run;

(B1)

proc phreg data = i_adtte;
where trtpn in (1 3);
class TRTPN (ref=LAST) / param = ref;
model AVAL*CNSR(1) = TRTPN;
hazardratio TRTPN;
contrast 'A vs. Control' TRTPN 1 0 ;
run;

(B2)

proc phreg data = i_adtte;
where trtpn in (2 3);
class TRTPN  (ref=LAST)/ param = ref;
model AVAL*CNSR(1) = TRTPN;
hazardratio TRTPN;
contrast 'B vs. Control' TRTPN 1 0 ;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @JG_PhM,

 

I also believe that analyzing all three treatment arms together is preferable over the separate contrasts (B1/B2). I'm planning to investigate this in more detail over the weekend, but the key observation is: As far as I see, all the statistics you mention (hazard ratios, chi-square tests, etc.) depend on the estimated regression parameters ("beta hat"). These, in turn, result from maximizing the appropriate partial likelihood function (see Partial Likelihood Function for the Cox Model).

 

Obviously, these functions differ between the approaches A (here, the partial likelihood function is a function of two variables), B1 and B2 (two different functions of one variable). Moreover, their definitions involve the distinct, ordered event times of all individuals (with non-missing data) in the input dataset. I think this is where "data from the arm not included in a contrast ... intervene in the calculation of the main effect of the other arm." The Cox model assumes a baseline survivor function that is common to all three treatment arms, which also makes it plausible that the data of all three arms contribute valuable information, even to a calculation focusing on only two arms.

View solution in original post

5 REPLIES 5
SteveDenham
Jade | Level 19

Consider the following: Suppose you did some sort of permutation test, where each observation has an equal probability of being assigned to control, treatment 1 or treatment 2. The specified comparisons make sense under this approach. Now suppose you did the same permutation test, but with restricted sampling so that only control and one of the treatment arms are used. My brain says to me - "Hey, these are not the same thing. Maybe I could expect something different from the two analyses."  And in the end the asymptotic tests, such as those in PHREG, should approach in probability the results with an infinite sample size for the permutation test. (I don't remember where this comes from, but Fisher might have something to say about it).  Now it is very possible that my brain is missing a key element in this argument, in which case you should just ignore any of my speculations.

 

SteveDenham

JG_PhM
Calcite | Level 5

Thank you, Steve

I understand your reasoning, as the patients come from the same population and are assigned randomly to the three groups and may have ended up in experimental group A or B it would make more sense not to filter out any data for the contrasts in the Cox regression. 

Best,

JG_PhM

FreelanceReinh
Jade | Level 19

Hi @JG_PhM,

 

I also believe that analyzing all three treatment arms together is preferable over the separate contrasts (B1/B2). I'm planning to investigate this in more detail over the weekend, but the key observation is: As far as I see, all the statistics you mention (hazard ratios, chi-square tests, etc.) depend on the estimated regression parameters ("beta hat"). These, in turn, result from maximizing the appropriate partial likelihood function (see Partial Likelihood Function for the Cox Model).

 

Obviously, these functions differ between the approaches A (here, the partial likelihood function is a function of two variables), B1 and B2 (two different functions of one variable). Moreover, their definitions involve the distinct, ordered event times of all individuals (with non-missing data) in the input dataset. I think this is where "data from the arm not included in a contrast ... intervene in the calculation of the main effect of the other arm." The Cox model assumes a baseline survivor function that is common to all three treatment arms, which also makes it plausible that the data of all three arms contribute valuable information, even to a calculation focusing on only two arms.

FreelanceReinh
Jade | Level 19

As promised, here are more details supporting the preliminary considerations of my previous post.

 

Part 1: Why approaches (A) and (B1)/(B2) result in different parameter estimates (and statistics derived therefrom).

 

For a small example dataset with only three observations per treatment group I computed the Breslow (partial) likelihood function explicitly (see formula in Partial Likelihood Function for the Cox Model). To keep it simple, I used Weibull distributions with a common shape parameter a=2 and a treatment-dependent scale parameter l and left all observations uncensored. More specifically, I defined the survival functions to be S(t)ᶿ with S(t)=exp(−(t/50)²) and q=3, 4 and 6 for treatments A (trtpn=1), B (trtpn=2) and Control (trtpn=3), respectively (i.e., the scale parameters of the three Weibull distributions are 50/sqrt(3), 25 and 50/sqrt(6), resp.).

 

So the true hazard ratios are simply 3/4 (A vs. B), 1/2 (A vs. Control) and 2/3 (B vs. Control). The natural logarithms of the latter two hazard ratios are the true parameter values in the Cox model using reference cell coding: b1=log(1/2)=−0.693147..., b2=log(2/3)=−0.405465...

 

/* Create a small sample dataset */

%let d='Weibull';
%let a=2;
%let b=50;

data small;
call streaminit(27182818);
array c[3] _temporary_ (3 4 6);
cnsr=0;
do trtpn=1 to 3;
  do id=trtpn*100+1 to trtpn*100+3;
    aval=quantile(&d,1-sdf(&d,rand(&d,&a,&b),&a,&b)**(1/c[trtpn]),&a,&b);
    output;
  end;
end;
run; /* 9 obs. */

 

After sorting by ascending AVAL, the observations of dataset SMALL are:

Obs    cnsr    trtpn     id      aval

 1       0       2      202     3.9024
 2       0       1      103     6.2252
 3       0       2      203    13.8476
 4       0       3      301    22.4053
 5       0       1      101    22.9306
 6       0       1      102    26.9580
 7       0       3      303    27.3498
 8       0       3      302    28.5957
 9       0       2      201    28.8078

Each of the nine observations contributes one factor to the product defining the Breslow likelihood function L2,A(b), following your approach (A). The factors are fractions whose numerators are determined by the temporal order of events: exp(b1) for trtpn=1, exp(b2) for trtpn=2 and 1 for the reference category, trtpn=3. So, the sequence of numerators is exp(b2), exp(b1), exp(b2), 1, exp(b1) and so on (according to column trtpn above).

 

The denominators of those fractions are determined by the risk sets at each event time. The first denominator is 3*exp(b1)+3*exp(b2)+3, where the coefficients 3, 3, 3 correspond to the three individuals at risk in treatment groups A, B and Control before the first event occurs. The sum of the coefficients in subsequent denominators decreases by one in each step (thanks to the absence of ties and censored observations in our example), reflecting the dropouts due to events that have occurred before the event time in question:

3*exp(b1)+2*exp(b2)+3,

2*exp(b1)+2*exp(b2)+3,

2*exp(b1)+1*exp(b2)+3,

2*exp(b1)+1*exp(b2)+2

and so on (again according to column trtpn above).

 

The Breslow likelihood functions L2,B1(b1) and L2,B2(b2) for your approach (B1)/(B2) are defined analogously. Unlike L2,A(b)=L2,A(b1b2), they are functions of only one variable and their function terms are products of only six factors as one of the three treatment groups is excluded by the WHERE condition on trtpn. The product of the numerators of L2,B1(b1) and L2,B2(b2) equals the numerator of L2,A(b), but this is not the case for the denominators. Basically, this is the reason why maximizing L2,A(b), in general, leads to different estimates for b1 and b2 than maximizing L2,B1(b1) and L2,B2(b2).

 

Here are the full definitions of the three Breslow likelihood functions for dataset SMALL:

proc fcmp outlib=work.funcs.test;
function L2A(b1,b2);
  return(exp(3*b1+2*b2)/( (3*exp(b1)+3*exp(b2)+3)*(3*exp(b1)+2*exp(b2)+3)*(2*exp(b1)+2*exp(b2)+3)
                         *(2*exp(b1)+1*exp(b2)+3)*(2*exp(b1)+  exp(b2)+2)*(  exp(b1)+  exp(b2)+2)
                         *(            exp(b2)+2)*(            exp(b2)+1)));
endsub;

function L2B1(b);
  return(exp(3*b)/((3*exp(b)+3)*(2*exp(b)+3)*(2*exp(b)+2)*(exp(b)+2)*2));
endsub;

function L2B2(b);
  return(exp(2*b)/((3*exp(b)+3)*(2*exp(b)+3)*(exp(b)+3)*(exp(b)+2)*(exp(b)+1)));
endsub;
run;

options cmplib=work.funcs;

We can use these to verify the log likelihood values in the tables "Maximum Likelihood Iteration History" requested by the ITPRINT option of the MODEL statements in the PROC PHREG steps below:

proc phreg data = small;
class trtpn(ref='3');
model aval*cnsr(1) = trtpn / itprint absfconv=1e-12;
hazardratio trtpn;
contrast 'A vs. Control'  trtpn  1 0;
contrast 'B vs. Control'  trtpn  0 1;
run;

proc phreg data = small;
where trtpn in (1 3);
class trtpn(ref='3');
model aval*cnsr(1) = trtpn / itprint absfconv=1e-12;
hazardratio trtpn;
contrast 'A vs. Control' trtpn 1;
run;

proc phreg data = small;
where trtpn in (2 3);
class trtpn(ref='3');
model aval*cnsr(1) = trtpn / itprint absfconv=1e-12;
hazardratio trtpn;
contrast 'B vs. Control' trtpn 1;
run;

For example, log(L2A(0,0))=-lfact(9)=-12.8018..., whereas log(L2B1(0))=log(L2B2(0))=-lfact(6)=-6.5792....

 

I have used a stricter convergence criterion (ABSFCONV= option) to obtain more accurate parameter estimates: b1=0.738263 and b2=0.0505605 from approach (A), b1=1.394494 and b2=0.0144768 from approach (B1)/(B2). Of course, all these estimates are far off the true parameter values b1 and b2 due to the small sample size.

 

Here are the graphs of the three log-likelihood functions:

Log-likelihood function for approach (A)Log-likelihood function for approach (A)Log-likelihood function for approach (B1)Log-likelihood function for approach (B1)Log-likelihood function for approach (B2)Log-likelihood function for approach (B2)

 

Code to produce the graphs:

Spoiler
data logL2A;
do i=30 to 100;
  b1=i/100;
  do j=-200 to 300;
    b2=j/1000;
    logL2A=log(L2A(b1, b2));
    output;
  end;
end;
run;

proc template;
define statgraph surface;
  begingraph;
    layout overlay3d;
      surfaceplotparm x=b1 y=b2 z=logL2A;
    endlayout;
  endgraph;
end;
run;

proc sgrender data=logL2A template=surface;
run;

data logL2B1;
do i=0 to 300;
  b1=i/100;
  logL2B1=log(L2B1(b1));
  output;
end;
run;

proc sgplot data=logL2B1;
series x=b1 y=logL2B1;
run;

data logL2B2;
do i=-200 to 300;
  b2=i/4000;
  logL2B2=log(L2B2(b2));
  output;
end;
run;

proc sgplot data=logL2B2;
series x=b2 y=logL2B2;
run;

I will defer part 2 to a separate post (to appear later today).

FreelanceReinh
Jade | Level 19

As a continuation of my previous post, here is:

Part 2: Comparison of approach (A) using a two-parameter likelihood function to approach (B1)/(B2) using two one-parameter likelihood functions.

 

The comparison is limited to the Weibull model defined in the previous post. The sample size is increased from 3 to 1000 subjects per treatment group. Again, censored observations are avoided. Data for 10,000 of these experiments are simulated and analyzed using both approaches. (The simulation uses the 64-bit Mersenne twister, which, unlike the default "MTHybrid" random-number generator, avoids tied event times.)

 

/* Simulate example data: 10,000 samples of 3*1000 subjects each */

%let d='Weibull';
%let a=2;
%let b=50;

data have;
call streaminit('MT64', 27182818); /* using the 64-bit Mersenne twister to reduce the frequency of ties */
array c[3] _temporary_ (3 4 6);
cnsr=0;
do k=1 to 10000;
  do trtpn=1 to 3;
    do i=1 to 1000;
      aval=quantile(&d,1-sdf(&d,rand(&d,&a,&b),&a,&b)**(1/c[trtpn]),&a,&b);
      output;
    end;
  end;
end;
run;

/* Perform Cox regression (approaches A, B1, B2) */

options nonotes;
ods noresults;
ods select none;

ods output ParameterEstimates=est;
proc phreg data = have;
by k;
class trtpn(ref='3');
model aval*cnsr(1) = trtpn;
run;

ods output ParameterEstimates=est1;
proc phreg data = have;
where trtpn in (1 3);
by k;
class trtpn(ref='3');
model aval*cnsr(1) = trtpn;
run;

ods output ParameterEstimates=est2;
proc phreg data = have;
where trtpn in (2 3);
by k;
class trtpn(ref='3');
model aval*cnsr(1) = trtpn;
run;

options notes;
ods results;
ods select all;

/* Compute differences between estimates and true parameter values beta1, beta2 */

%let beta1=log(1/2);
%let beta2=log(2/3);

proc transpose data=est out=trans(drop=_:) prefix=b;
by k;
var estimate;
id classval0;
run;

data diff;
merge trans
      est1(keep=k estimate rename=(estimate=b01))
      est2(keep=k estimate rename=(estimate=b02));
by k;
d1A=abs(b1-&beta1);
d2A=abs(b2-&beta2);
dA=sqrt(d1A**2+d2A**2);
d1B=abs(b01-&beta1);
d2B=abs(b02-&beta2);
dB=sqrt(d1B**2+d2B**2);
c=(.<dA<dB);
run;

/* Compare differences between approaches A and B (B1/B2) */

proc means data=diff;
var d:;
run;

proc ttest data=diff;
paired (d1A--dA):(d1B--dB);
run;

proc freq data=diff;
tables c / binomial (level='1');
run;

 

Both the paired t-tests (p<0.0001 for all three comparisons) and the binomial test (p<0.0001) indicate that the estimates obtained with approach A tend to be closer to the true parameter values in this example. However, the improvement is small, e.g., from a mean Euclidean distance dB of 0.057 to a mean dA of 0.056. In 45.0% percent (95%-CI 44.0 - 46.0) of the 10,000 simulated studies approach (B1)/(B2) performed even better than approach (A).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 961 views
  • 5 likes
  • 3 in conversation