- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have calculating power for a study involving active one treatment and one control treatment affecting a dichotomous yes/no response variable. At baseline it is believed that the proportions for both variables are p1=p2=0.125. At the post-treatment measurement the hypothesis is that the proportions would be p1=0.8 (for the active treatment) and p2=0.125 (for the control treatment). This will be tested using Fisher's exact test with a power of 80%. The randomization ratio of active to placebo should be 2:1. I use the following code for this:
proc power;
TWOSAMPLEFREQ
test=fisher
alpha=0.05
GROUPPROPORTIONS = (0.8 0.125)
GROUPWEIGHTS = (2 1)
power = 0.8
NTOTAL=.;
run;
which result in a total sample size of 24 (16 active and 8 placebo). However, this figure does not match what I get when I use the PASS software. There I get n1=13 and n2=6. The reason for the discrepancy is that PASS uses calculations based on exact permutations of the binomial distribution for the response variable and not Walters normal approximation that proc power uses (when I use the normal approximation method in PASS I also get 24 total sample size there).
How can I get proc power to calculate the required sample size based on exact permutations for TWOSAMPLEFREQ? I tried METHOD=exact, but it doesn't work. Seems like a serious shortcoming in proc power if this is not possible, since the reason we want to use Fisher's exact test is due to a small sample size where the normal approximation is not advisable.
Many thanks for some input 🙂
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @JoakimE,
I don't think you can get the exact results with PROC POWER (in the current version), sadly. However, thanks to the small sample sizes, you can perform the necessary calculations with PROC FREQ and DATA steps, as I did it for a similar request last month. Here's a quick adaptation of that code to your problem (please review):
%let nmin=4; /* minimum sample size for placebo group */
%let nmax=8; /* maximum sample size for placebo group */
%let alpha=0.05; /* significance level */
%let p1=0.8; /* success probability in active group */
%let p2=0.125; /* success probability in placebo group */
/* Create a dataset with all (2*n+e+1)*(n+1)-2 possible combinations of i=0, ..., 2*n+e successes (r=1) in the
active group (g=1, n1=2*n+e) and j=0, ..., n successes in the placebo group (g=2, n2=n), except the two
extreme cases i=j=0 and i=2*n+e & j=n -- for all values of n between &nmin and &nmax and e=0, 1 */
data comb;
do n=&nmin to &nmax;
do e=0 to 1;
do i=0 to 2*n+e;
do j=0 to n;
if i=j=0 | i=2*n+e & j=n then continue;
g=1;
r=0; c=2*n+e-i; output;
r=1; c=i; output;
g=2;
r=0; c=n-j; output;
r=1; c=j; output;
end;
end;
end;
end;
run;
/* Compute Fisher's exact test for each combination */
ods select none;
ods output FishersExact=fisher(where=(name1='XP2_FISH') keep=n e i j name1 nvalue1 rename=(nvalue1=p));
proc freq data=comb;
by n e i j;
tables g*r / chisq;
weight c;
run;
ods select all;
/* Compute power based on the joint distribution of two independent
Bin(2*n+e,&p1) and Bin(n,&p2) distributed random variables */
data power(keep=n1 n2 power);
retain n1 n2 power;
set fisher;
by n e;
where .<p<=α /* The two extreme cases excluded above would not meet this condition anyway. */
power+pdf('binom',i,&p1,2*n+e)*pdf('binom',j,&p2,n);
if last.e then do;
n1=2*n+e;
n2=n;
output;
power=0;
end;
run;
Results:
Obs n1 n2 power 1 8 4 0.35123 2 9 4 0.47768 3 10 5 0.64501 4 11 5 0.72224 5 12 6 0.78531 6 13 6 0.80349 7 14 7 0.81705 8 15 7 0.79312 9 16 8 0.90324 10 17 8 0.89930
The results suggest that, indeed, n1=13 and n2=6 are correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you check the documentation for PROC POWER (which should always be done before asking here), you would see that Fisher's exact test is indeed available for TWOSAMPLEFREQ.
The TWOSAMPLEFREQ statement performs power and sample size analyses for tests of two independent proportions. The Farrington-Manning score, Pearson’s chi-square, Fisher’s exact, and likelihood ratio chi-square tests are supported.
Use option TEST=FISHER
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You should always read the question before answering here 😉 (pardon the salty tone)
The question was not whether or not Fisher's exact test is available with TWOSAMPLEFREQ, but how to avoid the normal approximation as the default power calculation method (see my original post).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @JoakimE,
I don't think you can get the exact results with PROC POWER (in the current version), sadly. However, thanks to the small sample sizes, you can perform the necessary calculations with PROC FREQ and DATA steps, as I did it for a similar request last month. Here's a quick adaptation of that code to your problem (please review):
%let nmin=4; /* minimum sample size for placebo group */
%let nmax=8; /* maximum sample size for placebo group */
%let alpha=0.05; /* significance level */
%let p1=0.8; /* success probability in active group */
%let p2=0.125; /* success probability in placebo group */
/* Create a dataset with all (2*n+e+1)*(n+1)-2 possible combinations of i=0, ..., 2*n+e successes (r=1) in the
active group (g=1, n1=2*n+e) and j=0, ..., n successes in the placebo group (g=2, n2=n), except the two
extreme cases i=j=0 and i=2*n+e & j=n -- for all values of n between &nmin and &nmax and e=0, 1 */
data comb;
do n=&nmin to &nmax;
do e=0 to 1;
do i=0 to 2*n+e;
do j=0 to n;
if i=j=0 | i=2*n+e & j=n then continue;
g=1;
r=0; c=2*n+e-i; output;
r=1; c=i; output;
g=2;
r=0; c=n-j; output;
r=1; c=j; output;
end;
end;
end;
end;
run;
/* Compute Fisher's exact test for each combination */
ods select none;
ods output FishersExact=fisher(where=(name1='XP2_FISH') keep=n e i j name1 nvalue1 rename=(nvalue1=p));
proc freq data=comb;
by n e i j;
tables g*r / chisq;
weight c;
run;
ods select all;
/* Compute power based on the joint distribution of two independent
Bin(2*n+e,&p1) and Bin(n,&p2) distributed random variables */
data power(keep=n1 n2 power);
retain n1 n2 power;
set fisher;
by n e;
where .<p<=α /* The two extreme cases excluded above would not meet this condition anyway. */
power+pdf('binom',i,&p1,2*n+e)*pdf('binom',j,&p2,n);
if last.e then do;
n1=2*n+e;
n2=n;
output;
power=0;
end;
run;
Results:
Obs n1 n2 power 1 8 4 0.35123 2 9 4 0.47768 3 10 5 0.64501 4 11 5 0.72224 5 12 6 0.78531 6 13 6 0.80349 7 14 7 0.81705 8 15 7 0.79312 9 16 8 0.90324 10 17 8 0.89930
The results suggest that, indeed, n1=13 and n2=6 are correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks FreelanceReinhard, that was a neat solution! Something that works for small sample sizes at least. For larger sample sizes I guess the normal approximation or Chi2 test would suffice anyway. Just a bit strange that SAS has not thought of this problem I think...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Infection Rates:
Active 0.0, 0.05, 0.10, 0.15, 0.20, 0.25
Placebo 1/N, 0.99
Sample Size (N):
Active 15 to 12
Placebo 7 to 10
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @mariko5797,
Always interesting to reread one's own code after a fairly long time ... :-)
- g is the group identifier with values 1 for the active group and 2 for the placebo group.
- n1 is the number of subjects in the active group.
- n2=n is the number of subjects in the placebo group.
- e=n1-2*n, which is either 0 or 1 because the assumption was a 2:1 randomization ratio of active vs. placebo. (The case e=1 actually relaxes the exact 2:1 ratio a bit so that n1 is not restricted to even numbers.)
- r is the response variable with values 1 for success and 0 for failure.
- c is the number of subjects for a particular combination of g, n, e and r in dataset COMB.
- i is the number of successes in the active group.
- j is the number of successes in the placebo group.
So, in your case you don't need variable e. Just combine all possible values of n1=12,...,15 and n2=7,...,10, i.e., 16 combinations. With varying infection rates (in both treatment groups) you'll have two additional ("outer") DO loops and similarly two additional BY variables.