Solved: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

JoakimE · Posted 05-24-2022 05:45 AM

Hi,

I have calculating power for a study involving active one treatment and one control treatment affecting a dichotomous yes/no response variable. At baseline it is believed that the proportions for both variables are p1=p2=0.125. At the post-treatment measurement the hypothesis is that the proportions would be p1=0.8 (for the active treatment) and p2=0.125 (for the control treatment). This will be tested using Fisher's exact test with a power of 80%. The randomization ratio of active to placebo should be 2:1. I use the following code for this:

proc power;
TWOSAMPLEFREQ
test=fisher
alpha=0.05
GROUPPROPORTIONS = (0.8 0.125)
GROUPWEIGHTS = (2 1)
power = 0.8
NTOTAL=.;
run;

which result in a total sample size of 24 (16 active and 8 placebo). However, this figure does not match what I get when I use the PASS software. There I get n1=13 and n2=6. The reason for the discrepancy is that PASS uses calculations based on exact permutations of the binomial distribution for the response variable and not Walters normal approximation that proc power uses (when I use the normal approximation method in PASS I also get 24 total sample size there).

How can I get proc power to calculate the required sample size based on exact permutations for TWOSAMPLEFREQ? I tried METHOD=exact, but it doesn't work. Seems like a serious shortcoming in proc power if this is not possible, since the reason we want to use Fisher's exact test is due to a small sample size where the normal approximation is not advisable.

Many thanks for some input 🙂

FreelanceReinh · Posted 05-24-2022 07:56 AM

Hi @JoakimE,

I don't think you can get the exact results with PROC POWER (in the current version), sadly. However, thanks to the small sample sizes, you can perform the necessary calculations with PROC FREQ and DATA steps, as I did it for a similar request last month. Here's a quick adaptation of that code to your problem (please review):

%let nmin=4;     /* minimum sample size for placebo group */
%let nmax=8;     /* maximum sample size for placebo group */
%let alpha=0.05; /* significance level */
%let p1=0.8;     /* success probability in active group */
%let p2=0.125;   /* success probability in placebo group */

/* Create a dataset with all (2*n+e+1)*(n+1)-2 possible combinations of i=0, ..., 2*n+e successes (r=1) in the
   active group (g=1, n1=2*n+e) and j=0, ..., n successes in the placebo group (g=2, n2=n), except the two 
   extreme cases i=j=0 and i=2*n+e & j=n -- for all values of n between &nmin and &nmax and e=0, 1 */

data comb;
do n=&nmin to &nmax;
  do e=0 to 1;
    do i=0 to 2*n+e;
      do j=0 to n;
        if i=j=0 | i=2*n+e & j=n then continue;
        g=1;
        r=0; c=2*n+e-i; output;
        r=1; c=i;       output;
        g=2;
        r=0; c=n-j; output;
        r=1; c=j;   output;
      end;
    end;
  end;
end;
run;

/* Compute Fisher's exact test for each combination */

ods select none;
ods output FishersExact=fisher(where=(name1='XP2_FISH') keep=n e i j name1 nvalue1 rename=(nvalue1=p));
proc freq data=comb;
by n e i j;
tables g*r / chisq;
weight c;
run;
ods select all;

/* Compute power based on the joint distribution of two independent
   Bin(2*n+e,&p1) and Bin(n,&p2) distributed random variables */

data power(keep=n1 n2 power);
retain n1 n2 power;
set fisher;
by n e;
where .<p<=&alpha; /* The two extreme cases excluded above would not meet this condition anyway. */
power+pdf('binom',i,&p1,2*n+e)*pdf('binom',j,&p2,n);
if last.e then do;
  n1=2*n+e;
  n2=n;
  output;
  power=0;
end;
run;

Results:

Obs    n1    n2     power

  1     8     4    0.35123
  2     9     4    0.47768
  3    10     5    0.64501
  4    11     5    0.72224
  5    12     6    0.78531
  6    13     6    0.80349
  7    14     7    0.81705
  8    15     7    0.79312
  9    16     8    0.90324
 10    17     8    0.89930

The results suggest that, indeed, n1=13 and n2=6 are correct.

View solution in original post

PaigeMiller · Posted 05-24-2022 06:10 AM

If you check the documentation for PROC POWER (which should always be done before asking here), you would see that Fisher's exact test is indeed available for TWOSAMPLEFREQ.

The TWOSAMPLEFREQ statement performs power and sample size analyses for tests of two independent proportions. The Farrington-Manning score, Pearson’s chi-square, Fisher’s exact, and likelihood ratio chi-square tests are supported.

Use option TEST=FISHER

--
Paige Miller

JoakimE · Posted 05-25-2022 04:13 AM

You should always read the question before answering here 😉 (pardon the salty tone)

The question was not whether or not Fisher's exact test is available with TWOSAMPLEFREQ, but how to avoid the normal approximation as the default power calculation method (see my original post).

FreelanceReinh · Posted 05-24-2022 07:56 AM

Hi @JoakimE,

I don't think you can get the exact results with PROC POWER (in the current version), sadly. However, thanks to the small sample sizes, you can perform the necessary calculations with PROC FREQ and DATA steps, as I did it for a similar request last month. Here's a quick adaptation of that code to your problem (please review):

%let nmin=4;     /* minimum sample size for placebo group */
%let nmax=8;     /* maximum sample size for placebo group */
%let alpha=0.05; /* significance level */
%let p1=0.8;     /* success probability in active group */
%let p2=0.125;   /* success probability in placebo group */

/* Create a dataset with all (2*n+e+1)*(n+1)-2 possible combinations of i=0, ..., 2*n+e successes (r=1) in the
   active group (g=1, n1=2*n+e) and j=0, ..., n successes in the placebo group (g=2, n2=n), except the two 
   extreme cases i=j=0 and i=2*n+e & j=n -- for all values of n between &nmin and &nmax and e=0, 1 */

data comb;
do n=&nmin to &nmax;
  do e=0 to 1;
    do i=0 to 2*n+e;
      do j=0 to n;
        if i=j=0 | i=2*n+e & j=n then continue;
        g=1;
        r=0; c=2*n+e-i; output;
        r=1; c=i;       output;
        g=2;
        r=0; c=n-j; output;
        r=1; c=j;   output;
      end;
    end;
  end;
end;
run;

/* Compute Fisher's exact test for each combination */

ods select none;
ods output FishersExact=fisher(where=(name1='XP2_FISH') keep=n e i j name1 nvalue1 rename=(nvalue1=p));
proc freq data=comb;
by n e i j;
tables g*r / chisq;
weight c;
run;
ods select all;

/* Compute power based on the joint distribution of two independent
   Bin(2*n+e,&p1) and Bin(n,&p2) distributed random variables */

data power(keep=n1 n2 power);
retain n1 n2 power;
set fisher;
by n e;
where .<p<=&alpha; /* The two extreme cases excluded above would not meet this condition anyway. */
power+pdf('binom',i,&p1,2*n+e)*pdf('binom',j,&p2,n);
if last.e then do;
  n1=2*n+e;
  n2=n;
  output;
  power=0;
end;
run;

Results:

Obs    n1    n2     power

  1     8     4    0.35123
  2     9     4    0.47768
  3    10     5    0.64501
  4    11     5    0.72224
  5    12     6    0.78531
  6    13     6    0.80349
  7    14     7    0.81705
  8    15     7    0.79312
  9    16     8    0.90324
 10    17     8    0.89930

The results suggest that, indeed, n1=13 and n2=6 are correct.

JoakimE · Posted 05-25-2022 04:27 AM

Thanks FreelanceReinhard, that was a neat solution! Something that works for small sample sizes at least. For larger sample sizes I guess the normal approximation or Chi2 test would suffice anyway. Just a bit strange that SAS has not thought of this problem I think...

mariko5797 · Posted 12-18-2023 06:01 PM

Could you clarify what each letter stands for? I am trying to do something similar with differing sample sizes for active and placebo.

Infection Rates:
Active 0.0, 0.05, 0.10, 0.15, 0.20, 0.25
Placebo 1/N, 0.99

Sample Size (N):
Active 15 to 12
Placebo 7 to 10

FreelanceReinh · Posted 12-19-2023 09:11 AM

Hello @mariko5797,

Always interesting to reread one's own code after a fairly long time ... :-)

g is the group identifier with values 1 for the active group and 2 for the placebo group.
n1 is the number of subjects in the active group.
n2=n is the number of subjects in the placebo group.
e=n1-2*n, which is either 0 or 1 because the assumption was a 2:1 randomization ratio of active vs. placebo. (The case e=1 actually relaxes the exact 2:1 ratio a bit so that n1 is not restricted to even numbers.)
r is the response variable with values 1 for success and 0 for failure.
c is the number of subjects for a particular combination of g, n, e and r in dataset COMB.
i is the number of successes in the active group.
j is the number of successes in the placebo group.

So, in your case you don't need variable e. Just combine all possible values of n1=12,...,15 and n2=7,...,10, i.e., 16 combinations. With varying infection rates (in both treatment groups) you'll have two additional ("outer") DO loops and similarly two additional BY variables.

Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Re: Proc Power TWOSAMPLEFREQ test=fisher with METHOD=exact

Ready to join fellow brilliant minds for the SAS Hackathon?