Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- sample size calculation

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-31-2018 09:38 AM
(2866 views)

Hi,

I am learning to calculate the sample size. The following paragraph is from the attached article. I am not able to come up with number **275.** Please let me know The sas code Proc power which was used to calculate it.

The sample size calculation for the trial assumed that all patients randomized to the surgical group would undergo appendectomy.

For computational reasons, the success rate for surgery was assumed to be 99%. Prior similar studies found

success rates for antibiotic treatment of approximately70%to 80%.Thus,we anticipated a 75%success rate in the antibiotic

therapy group and a 24%(95%CI, 75%-99%) noninferiority margin was used for the sample size calculations. We estimated that 275 patients per group would yield a power of 0.90 (1-β) to establish whether antibiotic treatment was noninferior to appendectomy using a 1-sided significance α level of .05 with Proc Power version 9.2 (SAS Institute Inc)

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Kyra,

Thanks for posting this interesting question.

It's about noninferiority testing of proportions in two independent groups, so let's recap the hypotheses:

H0: p2−p1 <= −m H1: p2−p1 > −m

where, in the example of the article, p1 and p2 are the success probabilities of surgery and antibiotic therapy, respectively, and m=0.24 is the noninferiority margin.

PROC POWER documentation recommends using TEST=FM in this case. Good. However, in your first PROC POWER step you reversed the order of p1 and p2 in the GROUPPROPORTIONS option. That's why your result N=20 is way too small.

In the second step you corrected that, but you specified an invalid alternative: 0.75. Remember that power and sample size are always calculated for *specific* parameter values from the *alternative hypothesis* (H1). But the pair p1=0.99, p2=0.75 does not satisfy the inequality for H1 (see above). In fact it's on the edge of H0. Since p1 is more or less fixed, we have to choose a value p2 such that p2−0.99>−0.24, i.e. p2>0.75.

What value did the authors of the paper use? Strangely enough I couldn't find this important information in the paper. They wrote "we anticipated a 75% success rate in the antibiotic therapy group" (p. 2342), but 0.75 would be invalid, as explained above. Maybe it's a typo. Luckily, there is another article about this clinical trial freely available on the web:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585698/ from the journal "BMC surgery". I found it via clinicaltrials.gov (see section "Other Publications") using the ID mentioned in the first paper.

There it says: "we assumed ... 80% success rate for the antibiotic therapy" (p. 5). This makes much more sense!

```
proc power;
twosamplefreq test=fm
groupproportions = (0.99 0.8)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;
```

The result (with SAS 9.4 TS1M2, SAS/STAT 13.2) is **N=1258, i.e. 629 per group** -- a lot more than 275. We can compare this to the results for other values of p2 by modifying the GROUPPROPORTIONS option:

`groupproportions = 0.99 | 0.76 to 0.90 by 0.01`

Computed N Total Actual N Index Proportion2 Power Total 1 0.76 0.900 33360 2 0.77 0.900 8224 3 0.78 0.900 3604 4 0.79 0.900 1996 5 0.80 0.900 1258 6 0.81 0.901 860 7 0.82 0.900 620 8 0.83 0.900 466 9 0.84 0.901 362 10 0.85 0.902 288 11 0.86 0.901 232 12 0.87 0.903 192 13 0.88 0.900 158 14 0.89 0.903 134 15 0.90 0.905 114

The large N values for p1−p2 close to the margin are typical as small differences are hard to detect.

So, where does the discrepancy come from?

First of all, the authors used SAS 9.**2** (p. 2342), where the option TEST=FM was not available (see footnote in SAS Usage Note 48616 or the old documentation). They could have used TEST=PCHI (the default). However, this yields N=1234 (with SAS 9.4). To obtain N=550 (2*275), one had to use something like p2=0.8216. (See this paper, p. 12, for the relationship between TEST=PCHI and the Wald test.)

In the JAMA article (p. 2342) it looks as if they intended to achieve a 95% CI with a lower bound of 0.75. On p. 2343 it says "Noninferiority ... was tested using 1-sided Wald tests" (which is not a contradiction).

It may be a coincidence that the sample size for a similar *one-sample* test is N=549 (!):

```
proc power;
onesamplefreq
test=z
method=normal
varest=sample
nullp=0.99
p=0.80
margin = -0.24
power = 0.9
alpha = 0.05
sides = U
ntotal = .;
run;
```

(Note, however, that these 549 are for *one* group. They must not be divided by 2.)

I don't know how they arrived at 275 per group. But it's always good to double-check results. So, let's finally check our result N=1258 (629 per group) by means of a simulation:

```
/* Simulate 100000 trials with 629 patients per group and true success probabilities
p1=0.99 and p2=0.80 */
data sim(drop=ng) / view=sim;
call streaminit(27182818);
length grp $8;
ng=629;
do i=1 to 100000;
grp='surgery';
success=1; /* 1=yes */
n=rand('binom',0.99,ng);
output;
success=2; /* 2=no */
n=ng-n;
output;
grp='antibiot';
success=1;
n=rand('binom',0.80,ng);
output;
success=2;
n=ng-n;
output;
end;
run;
/* Perform the noninferiority tests */
ods select none;
ods noresults;
ods output PdiffNoninf=pdn;
proc freq data=sim;
by i;
weight n;
tables grp*success / alpha=0.05 riskdiff(noninf margin=0.24 method=fm);
run;
ods select all;
/* Check the proportion of trials in which H0 would have been rejected */
proc format;
value pdiff
low - -0.24 = '<=-0.24'
-0.24<-high = '> -0.24';
run;
proc freq data=pdn;
format lowerCL pdiff.;
tables lowerCL / binomial(level=2);
run;
```

(run time: about 34 s on my workstation)

Result:

Cumulative Cumulative LowerCL Frequency Percent Frequency Percent ------------------------------------------------------------ <=-0.24 10072 10.07 10072 10.07 > -0.24 89928 89.93 100000 100.00 Binomial Proportion LowerCL = > -0.24 Proportion 0.8993 ASE 0.0010 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011 Exact Conf Limits 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011

(Please note that LowerCL is the lower bound of a two-sided 90% CI and hence of a one-sided 95% CI, which is what the authors used.)

This result confirms that with 629 patients per group, not 275, a power of 90% would have been achieved (with the Farrington-Manning test, though, not the Wald test).

[Edit: only minor typo corrected]

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You say that you could not come up with 275. What code were attempting to use and what result did you get?

What statistical test(s) were to be used? One sample, two sample? Freq or means tested?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have to compare 2 proportions. It is two sample , noninferoirity.

proc power;

twosamplefreq test=fm

groupproportions = ( 0.75 0.99)

nullproportiondiff = -0.24

alpha = 0.05

sides = U

power = 0.9

ntotal = .;

run;

(With above code i get 20)

proc power;

twosamplefreq test=fm

groupproportions = ( 0.99 0.75)

nullproportiondiff = -0.24

alpha = 0.05

sides = U

power = 0.9

ntotal = .;

run;

( with above i get invalid.)

Thanks,

Prerna

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Kyra,

Thanks for posting this interesting question.

It's about noninferiority testing of proportions in two independent groups, so let's recap the hypotheses:

H0: p2−p1 <= −m H1: p2−p1 > −m

where, in the example of the article, p1 and p2 are the success probabilities of surgery and antibiotic therapy, respectively, and m=0.24 is the noninferiority margin.

PROC POWER documentation recommends using TEST=FM in this case. Good. However, in your first PROC POWER step you reversed the order of p1 and p2 in the GROUPPROPORTIONS option. That's why your result N=20 is way too small.

In the second step you corrected that, but you specified an invalid alternative: 0.75. Remember that power and sample size are always calculated for *specific* parameter values from the *alternative hypothesis* (H1). But the pair p1=0.99, p2=0.75 does not satisfy the inequality for H1 (see above). In fact it's on the edge of H0. Since p1 is more or less fixed, we have to choose a value p2 such that p2−0.99>−0.24, i.e. p2>0.75.

What value did the authors of the paper use? Strangely enough I couldn't find this important information in the paper. They wrote "we anticipated a 75% success rate in the antibiotic therapy group" (p. 2342), but 0.75 would be invalid, as explained above. Maybe it's a typo. Luckily, there is another article about this clinical trial freely available on the web:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585698/ from the journal "BMC surgery". I found it via clinicaltrials.gov (see section "Other Publications") using the ID mentioned in the first paper.

There it says: "we assumed ... 80% success rate for the antibiotic therapy" (p. 5). This makes much more sense!

```
proc power;
twosamplefreq test=fm
groupproportions = (0.99 0.8)
nullproportiondiff = -0.24
alpha = 0.05
sides = U
power = 0.9
ntotal = .;
run;
```

The result (with SAS 9.4 TS1M2, SAS/STAT 13.2) is **N=1258, i.e. 629 per group** -- a lot more than 275. We can compare this to the results for other values of p2 by modifying the GROUPPROPORTIONS option:

`groupproportions = 0.99 | 0.76 to 0.90 by 0.01`

Computed N Total Actual N Index Proportion2 Power Total 1 0.76 0.900 33360 2 0.77 0.900 8224 3 0.78 0.900 3604 4 0.79 0.900 1996 5 0.80 0.900 1258 6 0.81 0.901 860 7 0.82 0.900 620 8 0.83 0.900 466 9 0.84 0.901 362 10 0.85 0.902 288 11 0.86 0.901 232 12 0.87 0.903 192 13 0.88 0.900 158 14 0.89 0.903 134 15 0.90 0.905 114

The large N values for p1−p2 close to the margin are typical as small differences are hard to detect.

So, where does the discrepancy come from?

First of all, the authors used SAS 9.**2** (p. 2342), where the option TEST=FM was not available (see footnote in SAS Usage Note 48616 or the old documentation). They could have used TEST=PCHI (the default). However, this yields N=1234 (with SAS 9.4). To obtain N=550 (2*275), one had to use something like p2=0.8216. (See this paper, p. 12, for the relationship between TEST=PCHI and the Wald test.)

In the JAMA article (p. 2342) it looks as if they intended to achieve a 95% CI with a lower bound of 0.75. On p. 2343 it says "Noninferiority ... was tested using 1-sided Wald tests" (which is not a contradiction).

It may be a coincidence that the sample size for a similar *one-sample* test is N=549 (!):

```
proc power;
onesamplefreq
test=z
method=normal
varest=sample
nullp=0.99
p=0.80
margin = -0.24
power = 0.9
alpha = 0.05
sides = U
ntotal = .;
run;
```

(Note, however, that these 549 are for *one* group. They must not be divided by 2.)

I don't know how they arrived at 275 per group. But it's always good to double-check results. So, let's finally check our result N=1258 (629 per group) by means of a simulation:

```
/* Simulate 100000 trials with 629 patients per group and true success probabilities
p1=0.99 and p2=0.80 */
data sim(drop=ng) / view=sim;
call streaminit(27182818);
length grp $8;
ng=629;
do i=1 to 100000;
grp='surgery';
success=1; /* 1=yes */
n=rand('binom',0.99,ng);
output;
success=2; /* 2=no */
n=ng-n;
output;
grp='antibiot';
success=1;
n=rand('binom',0.80,ng);
output;
success=2;
n=ng-n;
output;
end;
run;
/* Perform the noninferiority tests */
ods select none;
ods noresults;
ods output PdiffNoninf=pdn;
proc freq data=sim;
by i;
weight n;
tables grp*success / alpha=0.05 riskdiff(noninf margin=0.24 method=fm);
run;
ods select all;
/* Check the proportion of trials in which H0 would have been rejected */
proc format;
value pdiff
low - -0.24 = '<=-0.24'
-0.24<-high = '> -0.24';
run;
proc freq data=pdn;
format lowerCL pdiff.;
tables lowerCL / binomial(level=2);
run;
```

(run time: about 34 s on my workstation)

Result:

Cumulative Cumulative LowerCL Frequency Percent Frequency Percent ------------------------------------------------------------ <=-0.24 10072 10.07 10072 10.07 > -0.24 89928 89.93 100000 100.00 Binomial Proportion LowerCL = > -0.24 Proportion 0.8993 ASE 0.0010 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011 Exact Conf Limits 95% Lower Conf Limit 0.8974 95% Upper Conf Limit 0.9011

(Please note that LowerCL is the lower bound of a two-sided 90% CI and hence of a one-sided 95% CI, which is what the authors used.)

This result confirms that with 629 patients per group, not 275, a power of 90% would have been achieved (with the Farrington-Manning test, though, not the Wald test).

[Edit: only minor typo corrected]

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much for taking time to answer my question.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.