🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Quartz | Level 8

## sample size calculation

Hi,

I am learning to calculate the sample size. The following paragraph is from the attached article. I am not able to come up with number 275. Please let me know The sas code Proc power which was used to calculate it.

The sample size calculation for the trial assumed that all patients randomized to the surgical group would undergo appendectomy.
For computational reasons, the success rate for surgery was assumed to be 99%. Prior similar studies found
success rates for antibiotic treatment of approximately70%to 80%.Thus,we anticipated a 75%success rate in the antibiotic
therapy group and a 24%(95%CI, 75%-99%) noninferiority margin was used for the sample size calculations. We estimated that 275 patients per group would yield a power of 0.90 (1-β) to establish whether antibiotic treatment was noninferior to appendectomy using a 1-sided significance α level of .05 with Proc Power version 9.2 (SAS Institute Inc)

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Jade | Level 19

## Re: sample size calculation

Hi @Kyra,

Thanks for posting this interesting question.

It's about noninferiority testing of proportions in two independent groups, so let's recap the hypotheses:

```H0: p2−p1 <= −m
H1: p2−p1 >  −m```

where, in the example of the article, p1 and p2 are the success probabilities of surgery and antibiotic therapy, respectively, and m=0.24 is the noninferiority margin.

PROC POWER documentation recommends using TEST=FM in this case. Good. However, in your first PROC POWER step you reversed the order of p1 and p2 in the GROUPPROPORTIONS option. That's why your result N=20 is way too small.

In the second step you corrected that, but you specified an invalid alternative: 0.75. Remember that power and sample size are always calculated for specific parameter values from the alternative hypothesis (H1). But the pair p1=0.99, p2=0.75 does not satisfy the inequality for H1 (see above). In fact it's on the edge of H0. Since p1 is more or less fixed, we have to choose a value p2 such that p2−0.99>−0.24, i.e. p2>0.75.

What value did the authors of the paper use? Strangely enough I couldn't find this important information in the paper. They wrote "we anticipated a 75% success rate in the antibiotic therapy group" (p. 2342), but 0.75 would be invalid, as explained above. Maybe it's a typo. Luckily, there is another article about this clinical trial freely available on the web:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585698/ from the journal "BMC surgery". I found it via clinicaltrials.gov (see section "Other Publications") using the ID mentioned in the first paper.

There it says: "we assumed ... 80% success rate for the antibiotic therapy" (p. 5). This makes much more sense!

``````proc power;
twosamplefreq test=fm
groupproportions = (0.99 0.8)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run; ``````

The result (with SAS 9.4 TS1M2, SAS/STAT 13.2) is N=1258, i.e. 629 per group -- a lot more than 275. We can compare this to the results for other values of p2 by modifying the GROUPPROPORTIONS option:

``groupproportions = 0.99 | 0.76 to 0.90 by 0.01``
```              Computed N Total

Actual         N
Index    Proportion2     Power     Total

1           0.76     0.900     33360
2           0.77     0.900      8224
3           0.78     0.900      3604
4           0.79     0.900      1996
5           0.80     0.900      1258
6           0.81     0.901       860
7           0.82     0.900       620
8           0.83     0.900       466
9           0.84     0.901       362
10           0.85     0.902       288
11           0.86     0.901       232
12           0.87     0.903       192
13           0.88     0.900       158
14           0.89     0.903       134
15           0.90     0.905       114```

The large N values for p1−p2 close to the margin are typical as small differences are hard to detect.

So, where does the discrepancy come from?

First of all, the authors used SAS 9.2 (p. 2342), where the option TEST=FM was not available (see footnote in SAS Usage Note 48616  or the old documentation). They could have used TEST=PCHI (the default). However, this yields N=1234 (with SAS 9.4). To obtain N=550 (2*275), one had to use something like p2=0.8216. (See this paper, p. 12, for the relationship between TEST=PCHI and the Wald test.)

In the JAMA article (p. 2342) it looks as if they intended to achieve a 95% CI with a lower bound of 0.75. On p. 2343 it says "Noninferiority ... was tested using 1-sided Wald tests" (which is not a contradiction).

It may be a coincidence that the sample size for a similar one-sample test is N=549 (!):

``````proc power;
onesamplefreq
test=z
method=normal
varest=sample
nullp=0.99
p=0.80
margin = -0.24
power = 0.9
alpha = 0.05
sides = U
ntotal = .;
run;``````

(Note, however, that these 549 are for one group. They must not be divided by 2.)

I don't know how they arrived at 275 per group. But it's always good to double-check results. So, let's finally check our result N=1258 (629 per group) by means of a simulation:

``````/* Simulate 100000 trials with 629 patients per group and true success probabilities
p1=0.99 and p2=0.80 */

data sim(drop=ng) / view=sim;
call streaminit(27182818);
length grp \$8;
ng=629;
do i=1 to 100000;
grp='surgery';
success=1; /* 1=yes */
n=rand('binom',0.99,ng);
output;
success=2; /* 2=no */
n=ng-n;
output;
grp='antibiot';
success=1;
n=rand('binom',0.80,ng);
output;
success=2;
n=ng-n;
output;
end;
run;

/* Perform the noninferiority tests */

ods select none;
ods noresults;
ods output PdiffNoninf=pdn;
proc freq data=sim;
by i;
weight n;
tables grp*success / alpha=0.05 riskdiff(noninf margin=0.24 method=fm);
run;
ods select all;

/* Check the proportion of trials in which H0 would have been rejected */

proc format;
value pdiff
low - -0.24 = '<=-0.24'
-0.24<-high = '> -0.24';
run;

proc freq data=pdn;
format lowerCL pdiff.;
tables lowerCL / binomial(level=2);
run;``````

(run time: about 34 s on my workstation)

Result:

```                                    Cumulative    Cumulative
LowerCL    Frequency     Percent     Frequency      Percent
------------------------------------------------------------
<=-0.24       10072       10.07         10072        10.07
> -0.24       89928       89.93        100000       100.00

Binomial Proportion
LowerCL = > -0.24

Proportion                0.8993
ASE                       0.0010
95% Lower Conf Limit      0.8974
95% Upper Conf Limit      0.9011

Exact Conf Limits
95% Lower Conf Limit      0.8974
95% Upper Conf Limit      0.9011```

(Please note that LowerCL is the lower bound of a two-sided 90% CI and hence of a one-sided 95% CI, which is what the authors used.)

This result confirms that with 629 patients per group, not 275, a power of 90% would have been achieved (with the Farrington-Manning test, though, not the Wald test).

[Edit: only minor typo corrected]

4 REPLIES 4
Super User

## Re: sample size calculation

You say that you could not come up with 275. What code were attempting to use and what result did you get?

What statistical test(s) were to be used? One sample, two sample? Freq or means tested?

Quartz | Level 8

## Re: sample size calculation

I have to compare 2 proportions. It is two sample , noninferoirity.

proc power;
twosamplefreq test=fm
groupproportions = ( 0.75 0.99)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;

(With above code i get 20)

proc power;
twosamplefreq test=fm
groupproportions   = ( 0.99  0.75)
nullproportiondiff = -0.24
alpha              = 0.05
sides              = U
power              = 0.9
ntotal             = .;
run;

( with above i get invalid.)

Thanks,

Prerna

Jade | Level 19

## Re: sample size calculation

Hi @Kyra,

Thanks for posting this interesting question.

It's about noninferiority testing of proportions in two independent groups, so let's recap the hypotheses:

```H0: p2−p1 <= −m
H1: p2−p1 >  −m```

where, in the example of the article, p1 and p2 are the success probabilities of surgery and antibiotic therapy, respectively, and m=0.24 is the noninferiority margin.

PROC POWER documentation recommends using TEST=FM in this case. Good. However, in your first PROC POWER step you reversed the order of p1 and p2 in the GROUPPROPORTIONS option. That's why your result N=20 is way too small.

In the second step you corrected that, but you specified an invalid alternative: 0.75. Remember that power and sample size are always calculated for specific parameter values from the alternative hypothesis (H1). But the pair p1=0.99, p2=0.75 does not satisfy the inequality for H1 (see above). In fact it's on the edge of H0. Since p1 is more or less fixed, we have to choose a value p2 such that p2−0.99>−0.24, i.e. p2>0.75.

What value did the authors of the paper use? Strangely enough I couldn't find this important information in the paper. They wrote "we anticipated a 75% success rate in the antibiotic therapy group" (p. 2342), but 0.75 would be invalid, as explained above. Maybe it's a typo. Luckily, there is another article about this clinical trial freely available on the web:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585698/ from the journal "BMC surgery". I found it via clinicaltrials.gov (see section "Other Publications") using the ID mentioned in the first paper.

There it says: "we assumed ... 80% success rate for the antibiotic therapy" (p. 5). This makes much more sense!

``````proc power;
twosamplefreq test=fm
groupproportions = (0.99 0.8)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run; ``````

The result (with SAS 9.4 TS1M2, SAS/STAT 13.2) is N=1258, i.e. 629 per group -- a lot more than 275. We can compare this to the results for other values of p2 by modifying the GROUPPROPORTIONS option:

``groupproportions = 0.99 | 0.76 to 0.90 by 0.01``
```              Computed N Total

Actual         N
Index    Proportion2     Power     Total

1           0.76     0.900     33360
2           0.77     0.900      8224
3           0.78     0.900      3604
4           0.79     0.900      1996
5           0.80     0.900      1258
6           0.81     0.901       860
7           0.82     0.900       620
8           0.83     0.900       466
9           0.84     0.901       362
10           0.85     0.902       288
11           0.86     0.901       232
12           0.87     0.903       192
13           0.88     0.900       158
14           0.89     0.903       134
15           0.90     0.905       114```

The large N values for p1−p2 close to the margin are typical as small differences are hard to detect.

So, where does the discrepancy come from?

First of all, the authors used SAS 9.2 (p. 2342), where the option TEST=FM was not available (see footnote in SAS Usage Note 48616  or the old documentation). They could have used TEST=PCHI (the default). However, this yields N=1234 (with SAS 9.4). To obtain N=550 (2*275), one had to use something like p2=0.8216. (See this paper, p. 12, for the relationship between TEST=PCHI and the Wald test.)

In the JAMA article (p. 2342) it looks as if they intended to achieve a 95% CI with a lower bound of 0.75. On p. 2343 it says "Noninferiority ... was tested using 1-sided Wald tests" (which is not a contradiction).

It may be a coincidence that the sample size for a similar one-sample test is N=549 (!):

``````proc power;
onesamplefreq
test=z
method=normal
varest=sample
nullp=0.99
p=0.80
margin = -0.24
power = 0.9
alpha = 0.05
sides = U
ntotal = .;
run;``````

(Note, however, that these 549 are for one group. They must not be divided by 2.)

I don't know how they arrived at 275 per group. But it's always good to double-check results. So, let's finally check our result N=1258 (629 per group) by means of a simulation:

``````/* Simulate 100000 trials with 629 patients per group and true success probabilities
p1=0.99 and p2=0.80 */

data sim(drop=ng) / view=sim;
call streaminit(27182818);
length grp \$8;
ng=629;
do i=1 to 100000;
grp='surgery';
success=1; /* 1=yes */
n=rand('binom',0.99,ng);
output;
success=2; /* 2=no */
n=ng-n;
output;
grp='antibiot';
success=1;
n=rand('binom',0.80,ng);
output;
success=2;
n=ng-n;
output;
end;
run;

/* Perform the noninferiority tests */

ods select none;
ods noresults;
ods output PdiffNoninf=pdn;
proc freq data=sim;
by i;
weight n;
tables grp*success / alpha=0.05 riskdiff(noninf margin=0.24 method=fm);
run;
ods select all;

/* Check the proportion of trials in which H0 would have been rejected */

proc format;
value pdiff
low - -0.24 = '<=-0.24'
-0.24<-high = '> -0.24';
run;

proc freq data=pdn;
format lowerCL pdiff.;
tables lowerCL / binomial(level=2);
run;``````

(run time: about 34 s on my workstation)

Result:

```                                    Cumulative    Cumulative
LowerCL    Frequency     Percent     Frequency      Percent
------------------------------------------------------------
<=-0.24       10072       10.07         10072        10.07
> -0.24       89928       89.93        100000       100.00

Binomial Proportion
LowerCL = > -0.24

Proportion                0.8993
ASE                       0.0010
95% Lower Conf Limit      0.8974
95% Upper Conf Limit      0.9011

Exact Conf Limits
95% Lower Conf Limit      0.8974
95% Upper Conf Limit      0.9011```

(Please note that LowerCL is the lower bound of a two-sided 90% CI and hence of a one-sided 95% CI, which is what the authors used.)

This result confirms that with 629 patients per group, not 275, a power of 90% would have been achieved (with the Farrington-Manning test, though, not the Wald test).

[Edit: only minor typo corrected]

Quartz | Level 8

## Re: sample size calculation

Thank you very much for taking time to answer my question.

Discussion stats
• 4 replies
• 2425 views
• 1 like
• 3 in conversation