Sample size for (average) bioequivalence

A_Tomczyk · Posted 05-08-2025 04:28 AM

Hi SAS Community!

The aim id to calculate the sample size for average bioequivalence trial.

I would like to replicate the below example from "Sample Size Calculations in Clinical Research" by Chow and Shao (3rd edition).

Below you can see the formula that is used as well.

So, with the SD=0.4, delta = 0.05, limit = 0.223, alpha = 0.05 and 80% power I am using the below code:

proc power;
twosamplemeans test=equiv_diff
DIST=NORMAL
lower = -0.223
upper = 0.223
meandiff = 0.05
stddev = 0.4
npergroup = .
power = 0.8;
run;

But I am getting 69 subjects required, not 21 (as in the book, or 24 from the table approximation)

Is there a different formula SAS is using? Or should I use a different procedure?

Sorry if I am missing something, just trying to get my head around that.

Thank you Guys,

Agnieszka

Ksharp · Posted 05-09-2025 12:59 AM

But from the formula of sas doc, it was different with yours . Maybe the bioequivalence you are talking about is different with sas equivalence test.

FreelanceReinh · Posted 05-13-2025 02:07 PM

Hi, Agnieszka (@A_Tomczyk), and welcome to the SAS Support Communities!

Thank you very much for posting this interesting question. I have investigated the issue using the second edition (2008) of your book (in which I found basically the same texts and formulas as shown in your photographs of the third edition, so I assume the preceding paragraphs of that section "10.2 Average Bioequivalence" are very similar as well).

Everything seemed relatively straightforward to me (note the typo −d_L, which should read d_L), until I stumbled across the sentence "Under the alternative hypothesis that |e| < d, ..." ending with an approximative power formula of the form 2F(...)−1. As it turned out, the authors switched the notation there: It would have been more consistent to write that alternative hypothesis like |d| < d_U .

Also, the approximation 2F(...)−1 seems unnecessarily coarse to me: I say "unnecessarily" because their equations "do not have an explicit solution" (quote from p. 261) anyway, so they could also have suggested a better approximation such as

F(d_U −t_a,2n-2s_1,1/√(2n)) − F(−d_U +t_a,2n-2s_1,1/√(2n))

with the cumulative distribution function F of "d hat", i.e., the cdf of a normal distribution (see formula 10.2.3 in the book if the numbering is the same in your edition). I use this in my code (which I will post separately). It is still an approximation because s_1,1 is used instead of its estimate "s_1,1 hat", but unlike theirs it does not require |d| (i.e., their |e|) to be very small. Using their approximation I sometimes got nonsensical negative values for the estimated power.

For an exact computation you can evaluate an integral of the joint probability density of "d hat" and "s_1,1 hat", which is not very difficult to do in a DATA step (also shown in my code) thanks to the independence of these two random variables.

In the next step of their derivation, the authors of the book use the upper b/2 quantile of the t distribution with 2n-2 degrees of freedom, although the corresponding normal quantile z_b/2 would better fit the previous "2F(...)−1" expression. But, as we know, those two quantiles don't differ much if n is sufficiently large. (And they do use z_b/2 in their "further simplified" sample size formula.)

Surprisingly, they made two mistakes in their example calculation (which you show in one of your photographs): z_0.05=1.64..., not 1.96, and z_0.10=1.28..., not 0.84. Using the correct quantiles, the result for the (approximate) estimated minimum sample size is 23, not 21. Interestingly, using the "better approximation" I've suggested further above, I obtain n = 18 -- and this is even confirmed by both the exact calculation (using the integral mentioned above) and the simulation of 500,000 cross-over studies (each for n = 17 and n = 18) that I did in my code.

The relevant hypothesis test can be implemented with PROC TTEST using the TOST option of the PROC TTEST statement in conjunction with the CROSSOVER= option of the VAR statement (also shown in my code). However, it seems to me that the corresponding sample size calculation is beyond the capabilities of PROC POWER (at least in my SAS/STAT 14.3 version), which does support calculations corresponding to using either TOST or CROSSOVER= in PROC TTEST. Your PROC POWER code uses the "TOST only" variant. In my code I show the corresponding PROC TTEST step as well as a simulation.

The GLMPOWER procedure doesn't help either (I think) because in the documentation it says: "Tests and contrasts that involve random effects are not supported." But the model used in Chow/Shao/Wang does involve a random effect.

So, unless someone else has other ideas, you can use

one of the approximations from the Chow et al. book or
the "better" approximation mentioned above or
the exact integral calculation or
a simulation

to obtain power and sample size for the model in question with SAS.

I am so sorry that I have to leave the office now, but I will post my code tomorrow (CEST time zone).

Ksharp · Posted 05-13-2025 09:22 PM

I think here your bioequivalence is totally different with Equivalence Testing of proc power.
Therefore , they have different formula and you can not expect their have the same result.

FreelanceReinh · Posted 05-14-2025 12:59 PM

As promised, here is the SAS code I developed during my investigation.

/* Simulation of 500,000 clinical trials with 2x2 crossover design to establish average bioequivalence between 
   two treatments using log-transformed data (cf. the example on p. 262 f. of
     Chow, S.C., Shao, J., and Wang, H. (2008): Sample Size Calculations in Clinical Research, 2nd ed.,
     Chapman & Hall/CRC Press/Taylor & Francis Group, New York
   and the derivation of the sample size estimate in this book) */

%let n=18;      /* sample size per group */
%let mu=1;      /* overall mean in the statistical model (10.2.1) used in the book, arbitrarily selected */
%let d=0.05;    /* fixed treatment effect ("delta" in the book) */

%let rho=0.5;   /* Between-subject std. deviations (sBT, sBR) and within-subject std. dev. (sWT, sWR),            */
%let sBT=0.35;  /* denoted "sigma_BT" etc. in the book, as well as the correlation rho of each subject's random   */
%let sBR=0.25;  /* effect between test (T) and reference (R) treatment have been arbitrarily selected, but        */
%let sWT=0.2;   /* aiming at the value 0.4 for the derived standard deviation parameter s11 (see below) used in   */
%let sWR=0.15;  /* the book example.                                                                              */

%let P1=0.07;   /* fixed effect of the first treatment period */
%let Q1=0.03;   /* fixed effect of the first sequence of treatments (T, R) */

/* Alternative set of parameters:

%let n=18;
%let mu=1;
%let d=0.05;
%let rho=0.15;
%let sBT=0.10;
%let sBR=0.05;
%let sWT=0.37;
%let sWR=0.11;
%let P1=0.12;
%let Q1=0.13;

*/

%let s11=%sysfunc(sqrt(&sBT**2+&sBR**2-2*&rho*&sBT*&sBR+&sWT**2+&sWR**2)); /* in the book: "sigma_1,1", described */
%put &=s11; /* =0.4 */                                    /* as "standard deviation for intra-subject comparison" */

data sim(drop=FT--sWR);
call streaminit('MT64',27182818);
retain sim k i j l S e y;
array F[*] FT FR (%sysevalf(&d/2) -%sysevalf(&d/2));
array P[2] (&P1 -&P1);
array Q[2] (&Q1 -&Q1);
array S_[*] ST SR;
array sW[*] sWT sWR (&sWT &sWR);
do sim=1 to 500000; /* Reduce this number for shorter run times. */
  do k=1 to 2; /* treatment sequence k=1: T, R; sequence k=2: R, T */
    do i=1 to &n;
      ST=rand('normal',0,&sBT); /* i-th subject's random effect; its bivariate normal distribution is simulated ... */
      SR=rand('normal',%sysevalf(&rho*&sBR/&sBT)*ST,%sysevalf(&sBR*%sysfunc(sqrt(%sysevalf(1-&rho**2)))));
             /* ... using the conditional distribution of SR given ST; alternatively, PROC SIMNORMAL could be used. */
      do j=1 to 2; /* first and second treatment period */
        l=2-(j=k); /* Values l=1, 2 correspond to treatments T and R, respectively. */
        S=S_[l];
        e=rand('normal',0,sW[l]); /* random errors */
        y=&mu+F[l]+P[j]+Q[k]+S+e; /* see equation (10.2.1) in the book */
        output;
      end;
    end;
  end;
end;
run;

/* Compute mean values of the (log-transformed) measurements per period-sequence combination */

proc summary data=sim nway;
by sim k;
class j;
var y;
output out=stats(drop=_:) mean=;
run;

/* Estimate the treatment effect ("delta hat" in the book) */

data delta(keep=sim d_hat);
set stats;
by sim;
if first.sim then d_hat=0;
d_hat+((-1)**(j~=k)*y);
if last.sim;
d_hat=d_hat/2;
run; 

/* Optional: Check the (normal) distribution of "delta hat" (formula (10.2.3) in the book)

proc univariate data=delta;
var d_hat;
histogram / normal;
run;
*/

/* Prepare estimation of "sigma_1,1 hat" squared */

proc transpose data=stats out=stats2(drop=_:) prefix=y;
by sim k;
id j;
var y;
run;

/* Estimate "sigma_1,1 hat" squared using formula (10.2.5) of the book */

data s112(keep=sim s112);
merge sim(keep=sim k i j y) stats2;
by sim k;
if first.sim then s112=0;
lag_y=lag(y);
if j=2 then s112+(lag_y-y-y1+y2)**2;
if last.sim;
s112=s112/(2*&n-2);
run;

/* Optional: "sigma_1,1 hat" squared and "delta hat" are independent; check uncorrelatedness and distribution
             of "sigma_1,1 hat" (cf. p. 261 of the book)

data chk;
merge delta s112;
by sim;
run;

proc corr data=chk;
var d_hat s112;
run;

data s112n;
set s112;
s112n=s112*(2*&n-2)/&s11**2; * This quantity should have a chi-square distribution with 2*n-2 degrees of freedom.;
run;

proc univariate data=s112n;
var s112n;
histogram / gamma(alpha=%eval(&n-1) sigma=2); * chi-square(df) distribution equals gamma(df/2, 2) distribution;
run;
*/

/* Compute 90% confidence interval for delta and perform the equivalence test ("two one-sided tests procedure")
   using delta_U=log(1.25)=0.223... and delta_L=log(0.8)=-delta_U (bioequivalence limits as per 2001 FDA guidance
   "Statistical Approaches to Establishing Bioequivalence" [https://www.fda.gov/media/70958/download]) */

data confd;
merge delta s112;
by sim;
hw=quantile('T',0.95,2*&n-2)*sqrt(s112)/2*sqrt(2/&n); /* half-with of the CI according to book p. 261 */
c=(abs(&d-d_hat)<=hw); /* =1 in case of coverage of the true parameter, 0 otherwise */
rej=(d_hat-hw>log(0.8) & d_hat+hw<log(1.25)); /* =1 if null hypotheses H_01 and H_02 are rejected */
run;                                          /* at the 5% significance level                     */

/* Check coverage of the 90% CI for delta and estimate the power of the equivalence test */

proc freq data=confd;
tables c rej / binomial(level='1');
run; /* proportion c=1: 0.8994 (95%-CI: [0.8985, 0.9002])
        prop.    rej=1: 0.8074 (95%-CI: [0.8063, 0.8085]) */

/* with &n=17: */
     /* proportion c=1: 0.8995 (95%-CI: [0.8987, 0.9003])
        prop.    rej=1: 0.7837 (95%-CI: [0.7825, 0.7848]) */

The above estimates of the test's power indicate that n = 18 is the minimum sample size required for 80% power, given the assumed values of parameters d and s_1,1. Very similar results are obtained with the "alternative set of parameters" (see that comment in the code).

/* Suggested better approximation of the power than the book's 2Phi(...)-1 formula (p. 261) */

data _null_;
do n=17, 18;
  pow= cdf('normal', log(1.25)-quantile('T',0.95,2*n-2)*sqrt(&s11**2/(2*n)),&d,sqrt(&s11**2/(2*n)))
      -cdf('normal',-log(1.25)+quantile('T',0.95,2*n-2)*sqrt(&s11**2/(2*n)),&d,sqrt(&s11**2/(2*n)));
  put n= "--> approx. power=" pow 6.4;
end;
run;

The above approximation yields 0.8095 for n = 18 and 0.7857 for n = 17, suggesting the same minimum sample size as the simulation.

As mentioned in my previous post, the equivalence test can also be implemented with PROC TTEST rather than
in several steps as above:

/* Restructure simulated data for application of PROC TTEST */

proc transpose data=sim out=simt(drop=_:) prefix=y;
by sim k i;
id j;
var y;
run;

data simtt;
set simt;
trt1=k;
trt2=3-k;
run;

/* Perform the equivalence test with PROC TTEST */

options nonotes;
ods select none;
ods noresults;

ods output EquivLimits=eqlim(where=(period=' ' & method='Pooled'));
proc ttest data=simtt tost(%sysfunc(log(0.8)), %sysfunc(log(1.25)));
by sim;
var y1 y2 / crossover=(trt1 trt2);
run;

options notes;
ods select all;
ods results;

proc freq data=eqlim;
tables assessment / binomial;
run;

/* Compare previous results to those from PROC TTEST */

data cmptt;
merge confd(keep=sim rej)
      eqlim(keep=sim assessment);
by sim;
run;

proc freq data=cmptt;
tables rej*assessment;
run; /* Results are identical: rej=1 <==> Assessment="Equivalent" */

My suggestion for an exact calculation of the power is to compute the integral of the joint probability density of "d hat" and "s_1,1 hat" over the rejection region, i.e., the set of ("d hat" and "s_1,1 hat") pairs satisfying the inequality |"d hat"| < d_U − t_a,2n−2 "s_1,1 hat"/√(2n) where the two null hypotheses H₀₁ and H₀₂ are rejected. As those two random variables are independent (see book, p. 261), their joint pdf is just the product fg of the pdfs of a normal distribution (f ) and a scaled chi distribution (g ). I found the pdf of a chi distribution in section 8.3 of Evans, M., Hastings, N. and Peacock, B. (2000), Statistical Distributions, 3rd ed., John Wiley & Sons, New York.

%let n=18;
%let d=0.05;
%let s11=0.4;
%let s112=%sysevalf(&s11**2);
%let alpha=0.05;

/* Select a subset of points where the joint pdf does not almost vanish */

data intset;
do i=-5000 to 6000;
  x=i/10000;
  do j=1000 to 8000;
    y=j/10000;
    fg= sqrt(&n/constant('PI'))/&s11*exp(-&n/&s112*(x-&d)**2)
       *2*((&n-1)/&s112)**(&n-1)/fact(&n-2)*y**(2*&n-3)*exp(-(&n-1)/&s112*y**2);
    if fg>1e-10 then output;
  end;
end;
run; /* 52891353 obs. */

proc means data=intset min max;
var i j;
run; /* Min and max are well inside the ranges, so it's safe to limit the integration to this set. */

/* Compute the integral numerically */

proc means data=intset sum;
where abs(x)<log(1.25)-quantile('T',1-&alpha,2*&n-2)*y/sqrt(2*&n);
var fg;
run; /* --> Power = 80703401.67/1e8 = 0.80703... . */

This value matches the result of the simulation, 0.8074 (95%-CI: [0.8063, 0.8085]), quite well.

I will continue in a subsequent post to avoid that this one is getting too long.

FreelanceReinh · Posted 05-14-2025 01:42 PM

[continuation of the previous post]

Now let's turn to your PROC POWER step where you obtained a sample size of 69 subjects per group. The corresponding power estimate is even more suitable for comparisons:

ods output output=ppow;

proc power;
twosamplemeans test=equiv_diff
dist=normal
lower = -0.223
upper = 0.223
meandiff = 0.05
stddev = 0.4
npergroup = 69
power = .;
run;

proc print data=ppow;
format power best16.;
run; /*  0.80179614325271 */

/* The same result (up to 14 decimals!) can be obtained by using a formula from section 3.2.3 of the book (p. 62): */

data _null_;
a=0.05;   /* significance level alpha */
dU=0.223; /* bioequivalence limit (log(1.25)) */
d=0.05;   /* mean difference delta */
s=0.4;    /* common standard deviation in both groups */
n=69;     /* sample size per group */
power=1-cdf('T',quantile('T',1-a,2*n-2),2*n-2,(dU-d)/(s*sqrt(2/n)))
       -cdf('T',quantile('T',1-a,2*n-2),2*n-2,(dU+d)/(s*sqrt(2/n)));
put power= best16.;
run;

The power computed above is that of an equivalence test based on a much simpler model than model (10.2.1) in the book: Just two independent normal distributions (ideally) with equal variances (which are assumed to be unknown). No cross-over design, just overall mean plus fixed treatment effect plus normal random error ("two-sample parallel design").

/* Simulate data of 500,000 experiments in a two-sample parallel design */

%let n=69;      /* sample size per group */
%let mu=1;      /* overall mean, arbitrarily selected */
%let d=0.05;    /* fixed treatment effect */
%let s=0.4;     /* common standard deviation in both groups */

data sim2(drop=F:);
call streaminit('MT64',31415927);
retain sim k i y e;
array F[*] FT FR (%sysevalf(&d/2) -%sysevalf(&d/2));
do sim=1 to 500000;
  do k=1 to 2;
    do i=1 to &n;
      e=rand('normal',0,&s);
      y=&mu+F[k]+e;
      output;
    end;
  end;
end;
run;

/* Perform equivalence test */

options nonotes;
ods select none;
ods noresults;

ods output EquivLimits=eqlim2(where=(method='Pooled'));
proc ttest data=sim2 tost(-0.223, 0.223);
by sim;
class k;
var y;
run;

options notes;
ods select all;
ods results;

proc freq data=eqlim2;
tables assessment / binomial;
run; /* proportion assessment="Equivalent", i.e., estimated power: 0.8011 (95%-CI: [0.8000, 0.8022]) */

Again, the result of the simulation matches the computed power 0.8017... obtained before.

Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Re: Sample size for (average) bioequivalence

Registration is open