Hi,
I am learning to calculate the sample size. The following paragraph is from the attached article. I am not able to come up with number 275. Please let me know The sas code Proc power which was used to calculate it.
The sample size calculation for the trial assumed that all patients randomized to the surgical group would undergo appendectomy.
For computational reasons, the success rate for surgery was assumed to be 99%. Prior similar studies found
success rates for antibiotic treatment of approximately70%to 80%.Thus,we anticipated a 75%success rate in the antibiotic
therapy group and a 24%(95%CI, 75%-99%) noninferiority margin was used for the sample size calculations. We estimated that 275 patients per group would yield a power of 0.90 (1-β) to establish whether antibiotic treatment was noninferior to appendectomy using a 1-sided significance α level of .05 with Proc Power version 9.2 (SAS Institute Inc)
Thanks
Hi @Kyra,
Thanks for posting this interesting question.
It's about noninferiority testing of proportions in two independent groups, so let's recap the hypotheses:
H0: p2−p1 <= −m H1: p2−p1 > −m
where, in the example of the article, p1 and p2 are the success probabilities of surgery and antibiotic therapy, respectively, and m=0.24 is the noninferiority margin.
PROC POWER documentation recommends using TEST=FM in this case. Good. However, in your first PROC POWER step you reversed the order of p1 and p2 in the GROUPPROPORTIONS option. That's why your result N=20 is way too small.
In the second step you corrected that, but you specified an invalid alternative: 0.75. Remember that power and sample size are always calculated for specific parameter values from the alternative hypothesis (H1). But the pair p1=0.99, p2=0.75 does not satisfy the inequality for H1 (see above). In fact it's on the edge of H0. Since p1 is more or less fixed, we have to choose a value p2 such that p2−0.99>−0.24, i.e. p2>0.75.
What value did the authors of the paper use? Strangely enough I couldn't find this important information in the paper. They wrote "we anticipated a 75% success rate in the antibiotic therapy group" (p. 2342), but 0.75 would be invalid, as explained above. Maybe it's a typo. Luckily, there is another article about this clinical trial freely available on the web:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585698/ from the journal "BMC surgery". I found it via clinicaltrials.gov (see section "Other Publications") using the ID mentioned in the first paper.
There it says: "we assumed ... 80% success rate for the antibiotic therapy" (p. 5). This makes much more sense!
proc power;
twosamplefreq test=fm
groupproportions = (0.99 0.8)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;
The result (with SAS 9.4 TS1M2, SAS/STAT 13.2) is N=1258, i.e. 629 per group -- a lot more than 275. We can compare this to the results for other values of p2 by modifying the GROUPPROPORTIONS option:
groupproportions = 0.99 | 0.76 to 0.90 by 0.01
Computed N Total Actual N Index Proportion2 Power Total 1 0.76 0.900 33360 2 0.77 0.900 8224 3 0.78 0.900 3604 4 0.79 0.900 1996 5 0.80 0.900 1258 6 0.81 0.901 860 7 0.82 0.900 620 8 0.83 0.900 466 9 0.84 0.901 362 10 0.85 0.902 288 11 0.86 0.901 232 12 0.87 0.903 192 13 0.88 0.900 158 14 0.89 0.903 134 15 0.90 0.905 114
The large N values for p1−p2 close to the margin are typical as small differences are hard to detect.
So, where does the discrepancy come from?
First of all, the authors used SAS 9.2 (p. 2342), where the option TEST=FM was not available (see footnote in SAS Usage Note 48616 or the old documentation). They could have used TEST=PCHI (the default). However, this yields N=1234 (with SAS 9.4). To obtain N=550 (2*275), one had to use something like p2=0.8216. (See this paper, p. 12, for the relationship between TEST=PCHI and the Wald test.)
In the JAMA article (p. 2342) it looks as if they intended to achieve a 95% CI with a lower bound of 0.75. On p. 2343 it says "Noninferiority ... was tested using 1-sided Wald tests" (which is not a contradiction).
It may be a coincidence that the sample size for a similar one-sample test is N=549 (!):
proc power;
onesamplefreq
test=z
method=normal
varest=sample
nullp=0.99
p=0.80
margin = -0.24
power = 0.9
alpha = 0.05
sides = U
ntotal = .;
run;
(Note, however, that these 549 are for one group. They must not be divided by 2.)
I don't know how they arrived at 275 per group. But it's always good to double-check results. So, let's finally check our result N=1258 (629 per group) by means of a simulation:
/* Simulate 100000 trials with 629 patients per group and true success probabilities
p1=0.99 and p2=0.80 */
data sim(drop=ng) / view=sim;
call streaminit(27182818);
length grp $8;
ng=629;
do i=1 to 100000;
grp='surgery';
success=1; /* 1=yes */
n=rand('binom',0.99,ng);
output;
success=2; /* 2=no */
n=ng-n;
output;
grp='antibiot';
success=1;
n=rand('binom',0.80,ng);
output;
success=2;
n=ng-n;
output;
end;
run;
/* Perform the noninferiority tests */
ods select none;
ods noresults;
ods output PdiffNoninf=pdn;
proc freq data=sim;
by i;
weight n;
tables grp*success / alpha=0.05 riskdiff(noninf margin=0.24 method=fm);
run;
ods select all;
/* Check the proportion of trials in which H0 would have been rejected */
proc format;
value pdiff
low - -0.24 = '<=-0.24'
-0.24<-high = '> -0.24';
run;
proc freq data=pdn;
format lowerCL pdiff.;
tables lowerCL / binomial(level=2);
run;
(run time: about 34 s on my workstation)
Result:
Cumulative Cumulative LowerCL Frequency Percent Frequency Percent ------------------------------------------------------------ <=-0.24 10072 10.07 10072 10.07 > -0.24 89928 89.93 100000 100.00 Binomial Proportion LowerCL = > -0.24 Proportion 0.8993 ASE 0.0010 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011 Exact Conf Limits 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011
(Please note that LowerCL is the lower bound of a two-sided 90% CI and hence of a one-sided 95% CI, which is what the authors used.)
This result confirms that with 629 patients per group, not 275, a power of 90% would have been achieved (with the Farrington-Manning test, though, not the Wald test).
[Edit: only minor typo corrected]
You say that you could not come up with 275. What code were attempting to use and what result did you get?
What statistical test(s) were to be used? One sample, two sample? Freq or means tested?
I have to compare 2 proportions. It is two sample , noninferoirity.
proc power;
twosamplefreq test=fm
groupproportions = ( 0.75 0.99)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;
(With above code i get 20)
proc power;
twosamplefreq test=fm
groupproportions = ( 0.99 0.75)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;
( with above i get invalid.)
Thanks,
Prerna
Hi @Kyra,
Thanks for posting this interesting question.
It's about noninferiority testing of proportions in two independent groups, so let's recap the hypotheses:
H0: p2−p1 <= −m H1: p2−p1 > −m
where, in the example of the article, p1 and p2 are the success probabilities of surgery and antibiotic therapy, respectively, and m=0.24 is the noninferiority margin.
PROC POWER documentation recommends using TEST=FM in this case. Good. However, in your first PROC POWER step you reversed the order of p1 and p2 in the GROUPPROPORTIONS option. That's why your result N=20 is way too small.
In the second step you corrected that, but you specified an invalid alternative: 0.75. Remember that power and sample size are always calculated for specific parameter values from the alternative hypothesis (H1). But the pair p1=0.99, p2=0.75 does not satisfy the inequality for H1 (see above). In fact it's on the edge of H0. Since p1 is more or less fixed, we have to choose a value p2 such that p2−0.99>−0.24, i.e. p2>0.75.
What value did the authors of the paper use? Strangely enough I couldn't find this important information in the paper. They wrote "we anticipated a 75% success rate in the antibiotic therapy group" (p. 2342), but 0.75 would be invalid, as explained above. Maybe it's a typo. Luckily, there is another article about this clinical trial freely available on the web:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585698/ from the journal "BMC surgery". I found it via clinicaltrials.gov (see section "Other Publications") using the ID mentioned in the first paper.
There it says: "we assumed ... 80% success rate for the antibiotic therapy" (p. 5). This makes much more sense!
proc power;
twosamplefreq test=fm
groupproportions = (0.99 0.8)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;
The result (with SAS 9.4 TS1M2, SAS/STAT 13.2) is N=1258, i.e. 629 per group -- a lot more than 275. We can compare this to the results for other values of p2 by modifying the GROUPPROPORTIONS option:
groupproportions = 0.99 | 0.76 to 0.90 by 0.01
Computed N Total Actual N Index Proportion2 Power Total 1 0.76 0.900 33360 2 0.77 0.900 8224 3 0.78 0.900 3604 4 0.79 0.900 1996 5 0.80 0.900 1258 6 0.81 0.901 860 7 0.82 0.900 620 8 0.83 0.900 466 9 0.84 0.901 362 10 0.85 0.902 288 11 0.86 0.901 232 12 0.87 0.903 192 13 0.88 0.900 158 14 0.89 0.903 134 15 0.90 0.905 114
The large N values for p1−p2 close to the margin are typical as small differences are hard to detect.
So, where does the discrepancy come from?
First of all, the authors used SAS 9.2 (p. 2342), where the option TEST=FM was not available (see footnote in SAS Usage Note 48616 or the old documentation). They could have used TEST=PCHI (the default). However, this yields N=1234 (with SAS 9.4). To obtain N=550 (2*275), one had to use something like p2=0.8216. (See this paper, p. 12, for the relationship between TEST=PCHI and the Wald test.)
In the JAMA article (p. 2342) it looks as if they intended to achieve a 95% CI with a lower bound of 0.75. On p. 2343 it says "Noninferiority ... was tested using 1-sided Wald tests" (which is not a contradiction).
It may be a coincidence that the sample size for a similar one-sample test is N=549 (!):
proc power;
onesamplefreq
test=z
method=normal
varest=sample
nullp=0.99
p=0.80
margin = -0.24
power = 0.9
alpha = 0.05
sides = U
ntotal = .;
run;
(Note, however, that these 549 are for one group. They must not be divided by 2.)
I don't know how they arrived at 275 per group. But it's always good to double-check results. So, let's finally check our result N=1258 (629 per group) by means of a simulation:
/* Simulate 100000 trials with 629 patients per group and true success probabilities
p1=0.99 and p2=0.80 */
data sim(drop=ng) / view=sim;
call streaminit(27182818);
length grp $8;
ng=629;
do i=1 to 100000;
grp='surgery';
success=1; /* 1=yes */
n=rand('binom',0.99,ng);
output;
success=2; /* 2=no */
n=ng-n;
output;
grp='antibiot';
success=1;
n=rand('binom',0.80,ng);
output;
success=2;
n=ng-n;
output;
end;
run;
/* Perform the noninferiority tests */
ods select none;
ods noresults;
ods output PdiffNoninf=pdn;
proc freq data=sim;
by i;
weight n;
tables grp*success / alpha=0.05 riskdiff(noninf margin=0.24 method=fm);
run;
ods select all;
/* Check the proportion of trials in which H0 would have been rejected */
proc format;
value pdiff
low - -0.24 = '<=-0.24'
-0.24<-high = '> -0.24';
run;
proc freq data=pdn;
format lowerCL pdiff.;
tables lowerCL / binomial(level=2);
run;
(run time: about 34 s on my workstation)
Result:
Cumulative Cumulative LowerCL Frequency Percent Frequency Percent ------------------------------------------------------------ <=-0.24 10072 10.07 10072 10.07 > -0.24 89928 89.93 100000 100.00 Binomial Proportion LowerCL = > -0.24 Proportion 0.8993 ASE 0.0010 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011 Exact Conf Limits 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011
(Please note that LowerCL is the lower bound of a two-sided 90% CI and hence of a one-sided 95% CI, which is what the authors used.)
This result confirms that with 629 patients per group, not 275, a power of 90% would have been achieved (with the Farrington-Manning test, though, not the Wald test).
[Edit: only minor typo corrected]
Thank you very much for taking time to answer my question.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.