Solved: Re: Sample size, superiority and noninferiority trial

Skillside · Posted 10-06-2022 06:48 AM

Hi,

i would like to ask you for the simple calculation of sample size for noninferiority or superiority trial.

Let's asumme that i have two treatment groups. One group will have succes around 95% the second 96%.

I would like to calucalte the sample size for the second group where i use that treatment is not worse in comparison to ref group (95%). I can expect the max 4% of difference.

I found the following code for non-infreiority code:

proc power;
twosamplefreq groupweights=(1 1) groupps=(0.95 0.96) alpha=0.025 power=0.9
test=PChi sides=1 ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Binomial Proportions (1:1 Allocation"
title2 "in a Non-Inferiority Trial";
run;

What should i do to calculate the superiority trial? (i will turn over ther assumtions).

Cheers,

Skillside

FreelanceReinh · Posted 10-06-2022 06:47 PM

Yes, my example answers your question

What should i do to calculate the superiority trial?

Yes, the 0.90=90% is a totally hypothetical value for the true success probability in the reference group. Starting from your 96% for the other group and a 4% superiority margin (motivated by your earlier remark

I can expect the max 4% of difference.

which I may have misunderstood), I had to select a value <96%−4%=92% and I arbitrarily chose 90% just as an example. Of course, you must specify both the success probabilities and the margin appropriately, based on your subject matter knowledge and requirements.

and if you swap the order then you will have non-inferiroirty sample size calculation?

Yes, changing my example to groupps=(0.96 0.90) would make it an example for the non-inferiority case: A treatment with 96% success probability would be shown to be "less than 4% worse" (!) than the reference group with 90% success probability. We would expect that a relatively small sample size is sufficient to show that -- after all, the treatment is not at all worse, but even better than the other. Indeed, the result is n=148 per group. If the treatment was in fact 1% worse than the reference -- say, groupps=(0.95 0.96) -- we could still perform the non-inferiority test with the margin of 4%. Of course, we would expect to need a larger sample and PROC POWER confirms this: n=1070 per group.

View solution in original post

FreelanceReinh · Posted 10-06-2022 11:54 AM

Hi @Skillside,

@Skillside wrote:

Let's asumme that i have two treatment groups. One group will have succes around 95% the second 96%.

I would like to calucalte the sample size for the second group where i use that treatment is not worse in comparison to ref group (95%). I can expect the max 4% of difference.

I found the following code for non-infreiority code:

proc power;
twosamplefreq groupweights=(1 1) groupps=(0.95 0.96) alpha=0.025 power=0.9
test=PChi sides=1 ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Binomial Proportions (1:1 Allocation"
title2 "in a Non-Inferiority Trial";
run;

This code uses the default null proportion difference of zero -- where I'd say non-inferiority and superiority "coincide" and the test is just an ordinary one-sided test of the null hypothesis of equal success probabilities. Moreover, it calculates the total sample size, not "the sample size for the second group".

What should i do to calculate the superiority trial? (i will turn over ther assumtions).

If you specify a margin of, say, 4%, you can test if the success probability in the experimental group is less than 4 percentage points lower than in the reference group ("non-inferiority") or you can test if it is more than 4 percentage points higher than in the reference group ("superiority"). For the latter the sample size calculation requires that the assumed true difference is greater than the margin, e.g., 6% (reference group: 90%, experimental group: 96%). The sample size calculation could then look like this (cf. the documentation):

proc power;
twosamplefreq groupps=(0.90 0.96) alpha=0.025 nullpdiff=0.04 power=0.9
test=fm sides=1 npergroup=.;
run;

Result: n=3313 per group (i.e., 6626 in total).

Skillside · Posted 10-06-2022 01:41 PM

Thank you for your replay. Is it mean that 0.90 in your code is the ref group? and experimental group is 0.96 so you calculate the superiority? and if you swap the order then you will have non-inferiroirty sample size calculation?

thank you

FreelanceReinh · Posted 10-06-2022 06:47 PM

Yes, my example answers your question

What should i do to calculate the superiority trial?

Yes, the 0.90=90% is a totally hypothetical value for the true success probability in the reference group. Starting from your 96% for the other group and a 4% superiority margin (motivated by your earlier remark

I can expect the max 4% of difference.

which I may have misunderstood), I had to select a value <96%−4%=92% and I arbitrarily chose 90% just as an example. Of course, you must specify both the success probabilities and the margin appropriately, based on your subject matter knowledge and requirements.

and if you swap the order then you will have non-inferiroirty sample size calculation?

Yes, changing my example to groupps=(0.96 0.90) would make it an example for the non-inferiority case: A treatment with 96% success probability would be shown to be "less than 4% worse" (!) than the reference group with 90% success probability. We would expect that a relatively small sample size is sufficient to show that -- after all, the treatment is not at all worse, but even better than the other. Indeed, the result is n=148 per group. If the treatment was in fact 1% worse than the reference -- say, groupps=(0.95 0.96) -- we could still perform the non-inferiority test with the margin of 4%. Of course, we would expect to need a larger sample and PROC POWER confirms this: n=1070 per group.

Skillside · Posted 10-07-2022 04:30 PM

Thank you for your explanation.

Skillside · Posted 10-11-2022 02:59 PM

I have an addtional question. So i calculated the required sample size and i am going to conduct a study. Probably, at the end of the study, i will asses the differences (of outcomes) with Chi square test and when i implement the results from sample size estimation (it was set on 0.9 power) the power of the Chi-square test is around 0.6. I was sure that after the calculation of the required sample size the tests will be more powerful, could you please provide me an explanation of this situation?

FreelanceReinh · Posted 10-11-2022 03:59 PM

@Skillside wrote:
... when i implement the results from sample size estimation (it was set on 0.9 power) the power of the Chi-square test is around 0.6.

How did you determine that power "around 0.6"? I had checked all of the sample sizes computed with PROC POWER for this thread by means of PROC FREQ (with the appropriate options for non-inferiority or superiority tests, respectively) applied to datasets simulating between 10,000 and 100,000 studies. Mostly the estimated power was close to 90% (like 90.x%), as expected. In the remaining few cases it was even slightly higher (like between 91% and 92%).

Note that I consistently used the Farrington-Manning score test for cases with "p₀≠0" (i.e., for proper non-inferiority and superiority tests) both in PROC POWER (see option test=fm) and PROC FREQ, not the Pearson chi-square test, because doing so is recommended in the documentation. If you have a good reason for using the chi-square test, you can explicitly use the option test=pchi of the TWOSAMPLEFREQ statement in the power calculation. I haven't tested this yet, but wouldn't expect power differences as large as 0.6 vs. 0.9 for the same sample size, unless in extreme cases.

If you continue to observe an unexpectedly low power of your tests, I think it would be a good idea to open a new thread in the Statistical Procedures forum for this new problem. Thus you will also get a larger audience for your question. Of course, I will look into it as well. You can post a link to the current thread and describe the discrepancy between your observations and the results from PROC POWER in more detail. In particular, post the PROC POWER and (if any) PROC FREQ code you used.

Good luck!

Skillside · Posted 10-11-2022 04:16 PM

Thank you for your answer. So let me show you what i've done. Superiority for drug 1 vs drug 2
proc power;
twosamplefreq groupps=(0.97 0.95) alpha=0.025 nullpdiff=0.01 power=0.8
test=fm sides=1 npergroup=.;
run;
The result is 689.

Then the outcome - endpoints (binary variables) will be tested via Chi-square test. So i checked the power of the Chi-square.
The hypothesis in Chi-square test assumes no difference:
proc power;
twosamplefreq test=pchi
groupproportions = (.97 .95)
nullproportiondiff = 0.00
power = .
npergroup = 698;
run;

The result show power at level 0.4...
Thank you for your answer.
Could you please explain me what nullproportiondiff is for in chi-square while the it assumes no differences?

FreelanceReinh · Posted 10-11-2022 05:44 PM

Thanks for providing the details.

Your first PROC POWER step is for a non-inferiority (not: superiority) test because the value of the NULLPDIFF= option is positive in a lower one-sided test. It computes the sample size for a study to show that drug 1 (with success rate 0.97) is not inferior to drug 2 (with success rate 0.95) by more than 0.01. The sample size is relatively small because drug 1 is in fact better than drug 2 (in terms of success rates).

Your second PROC POWER step computes the power of a different test: a test of equality of success rates, not non-inferiority (also: chi-square instead of F.-M. score test, but this is not the main difference). It is similar to a non-inferiority test with a very small NULLPDIFF value ("limit case"). Since the difference between the success rates is only 0.02, the required sample size is substantially higher (and, correspondingly, the power for the same sample size much lower, as you've seen) than for the non-inferiority test described above where the NULLPDIFF was in the opposite direction of the difference between the true success rates. If you change NULLPDIFF to 0.01 (and use npergroup=689), you obtain a power of 0.811 -- close to the power of the recommended score test. (Technically, the chi-square test can also be used in the non-inferiority or superiority cases, which is why the NULLPDIFF= option is available also for this test.)

If you want to perform a superiority test showing that drug 1 is superior to drug 2 by a margin of 0.01, i.e., that the assumed difference of 0.02 is significantly larger than 0.01, you need to specify NULLPDIFF=-0.01 or (equivalently) swap the two values in the GROUPPS= option, as we discussed recently. We should expect an even larger sample size than for the test of equality and, indeed, PROC POWER yields n=5959 per group. (I haven't run simulations for this case yet, but can do so tomorrow if you like. It's close to midnight in my time zone and I'm about to leave the office, sorry for the inconvenience.)

Skillside · Posted 10-11-2022 06:23 PM

Thank you for your support. This discussion is exciting for me. But hold on..... seems like i misunderstood something.
Once again, let's use a two scenarios (S1 and S2).
S1: I would like to conduct samplesize estimation in a noninferiority study. The drug 2 is not worse than drug 1, where the success rate of drug 1 is 97% and the success rate for drug 2 is 95%. So difference in the endpoints is 2%, i suggested margin (probably nullpdiff) 1% or 3% or 4%. I was sure that in this case the assumption for margin = 1 is that drug 2 will have success rate between 94 and 96%.
S2: let's assume that drug 2 to will be superior to drug 1. Still the succes rates are the same for both drugs, but then i have to expect margin at least 3?
Other questions: if i have set the nullpdiff option as positive value then swaping the order in the groupps option changes the calculation from superior to noninferiority samplesize calculation. And if i won't swap order in groupps option and i will change the value of nullpdiff for negative then i will as well change from superiority to noninferiority study?
What should i do to decrease the samplesize? the drugs do not differ between each other, i either believe that this one with 95% success rate(SR) is superior to drug with 97% SR.
Please, suggest a solution for that case. The study will take too much time if i will have 1500 participants per arm. In my time zone it after midnight, 00:22. Have a nice night! Thanks again, and i hope to talk to you tomorrow.

FreelanceReinh · Posted 10-12-2022 06:43 AM

S1: Given the success rates (probabilities) of 97% for drug 1 and 95% for drug 2, non-inferiority of drug 1 (vs. drug 2) could be shown for any positive margin with sufficiently large (but limited) sample size (for a given value of power) because it's in fact superior to drug 2.

However, non-inferiority of drug 2 vs. drug 1 could be shown only for non-inferiority margins >2%. With a margin approaching 2% (from above) the required sample size (for a constant power) grows to infinity. Margins <=2% are infeasible.

S2: With the above success rates drug 1 is superior to drug 2, not vice versa. One could test the most common null hypothesis of equal success rates. The required sample size would tend to be greater than that of the non-inferiority trial of scenario 1 (because of the stronger alternative hypothesis). If there was a requirement (e.g. by regulatory authorities) for a superiority trial in the strong sense, i.e., with a positive superiority margin, this could be done as well, but only for superiority margins <2% and with an even larger sample size. Similar to the limitations of S1, with a margin approaching 2% (from below) the required sample size (for a constant power) grows to infinity. Margins >=2% are infeasible.

@Skillside wrote:
And if i won't swap order in groupps option and i will change the value of nullpdiff for negative then i will as well change from superiority to noninferiority study?

Yes, changing the sign of the null proportion difference, everything else being the same, switches between "non-inferiority" and "superiority" according to this quote from the documentation: "If p₀<0 in an upper one-sided test or p₀>0 in a lower one-sided test, then the test is a noninferiority test. If p₀>0 in an upper one-sided test or p₀<0 in a lower one-sided test, then the test is a superiority test." I found it helpful to write the SIDES= option of the TWOSAMPLEFREQ statement explicitly in the form sides=L or sides=U rather than sides=1 so that the direction of the one-sided test is more obvious. (If you specify a nonsensical or infeasible combination of options, the PROC POWER output will show an "Invalid input" error.)

the drugs do not differ between each other, i either believe that this one with 95% success rate(SR) is superior to drug with 97% SR.

You say that the "drugs do not differ between each other," yet you assume different success rates of 95% vs. 97%? In what sense would the drug with 95% SR be superior to the drug with 97% SR?

What should i do to decrease the samplesize?

I think the starting point should be the research question in view of the regulatory requirements (for example, cf. ICH guideline E9, section 3.3 "Type of Comparison"). A non-inferiority trial showing that the SR of the new drug is only slightly lower (if at all) than that of standard therapy might be justified by advantages of the new drug in other aspects, e.g., safety or cost. As mentioned above, the required sample size will tend to be lower than that for the usual test of the null hypothesis of equal success rates. The larger the non-inferiority margin, the lower the sample size. Of course, the choice of the margin must be clinically well justified. Also the consequences of type I and II errors must be considered carefully. Increasing alpha or decreasing the power decreases the sample size. Similar considerations apply to superiority trials in the strong sense. The smaller the superiority margin, the lower the sample size.

Skillside · Posted 10-12-2022 01:49 PM

Thank You for your answer, and i am sorry for delay in resposne. Tough day, today 😉

So if i understood it correctly the following proc will be correct for noninferiority sample size calculation,

so i cannot set nullpdiff below 2%

proc power;
twosamplefreq groupps=(0.97 0.95) alpha=0.025 nullpdiff=0.02 power=0.9
test=fm sides=1 npergroup=.;
run;

That gives 535 and for nullpdiff is more optimistic 356.

But then when i calcualte power for Chi-square test it doesn't give the sufficient power.

I wrote: the drugs do not differ between each other, i either believe that this one with 95% success rate(SR) is superior to drug with 97% SR

Well, they differ between endpoints but i assume that in real life scenario and on the bigger sample it will be the same as first drug or even better.

I meant that i assume the margin which will show equality or even superiority.

Could you provide some simulations, please?

Skillside · Posted 10-12-2022 01:52 PM

And thank you for providing the link to documentation, but unfortunately, i don't understand most of the equations, i am not mathematician

FreelanceReinh · Posted 10-12-2022 04:30 PM

You're welcome. I was busy with other things, too.

@Skillside wrote:

So if i understood it correctly the following proc will be correct for noninferiority sample size calculation,

proc power;
twosamplefreq groupps=(0.97 0.95) alpha=0.025 nullpdiff=0.02 power=0.9
test=fm sides=1 npergroup=.;
run;

That gives 535 ...

Correct.

so i cannot set nullpdiff below 2%

Not correct. Null proportion differences less than 2% are no problem since the difference 0.95-0.97=-0.02 is still less than any positive number. NULLPDIFF values <=-0.02 would be invalid (but for negative values we would be in the situation of a superiority test anyway).

and for nullpdiff is more optimistic 356.

This is the result for nullpdiff=0.03.

But then when i calcualte power for Chi-square test it doesn't give the sufficient power.

Sure, because this means nullpdiff=0, which is much closer to the assumed true difference 0.95-0.97=-0.02 than the nullpdiff=0.02 we used for the non-inferiority test.

Well, they differ between endpoints but i assume that in real life scenario and on the bigger sample it will be the same as first drug or even better.

I meant that i assume the margin which will show equality or even superiority.

Since a superiority test would require an even larger sample size than the equality (chi-square) test, I'd say that we can focus on equality or non-inferiority.

Could you provide some simulations, please?

Here they are: Let's start with the non-inferiority studies corresponding to your PROC POWER step above.

%let nsim=10000; /* number of simulated studies */
%let n=535;      /* sample size per group */
%let p1=0.97;    /* success probability in group 1 */
%let p2=0.95;    /* success probability in group 2 */
%let m=0.02;     /* non-inferiority margin (absolute value of the null proportion difference) */

/* Simulate the results of &NSIM studies */

data sim;
call streaminit(3141592);
do i=1 to &nsim;
  do grp=1, 2;
    success=1;
    k=rand('binom', choosen(grp, &p1, &p2), &n);
    output;
    success=0;
    k=&n-k;
    output;
  end;
end;
run;

/* Perform non-inferiority (Farrington-Manning score) test for each study */

ods select none;
ods noresults;
ods output pdiffnoninf=pdni;
proc freq data=sim;
by i;
weight k;
tables grp*success / riskdiff(column=2 method=fm noninf margin=&m);
run;
ods select all;
ods results;

/* Determine the proportion of studies with a significant test result 
   (i.e., showing non-inferiority of the treatment in group 1)
*/

proc format;
value pvone
low-0.025='significant'
0.025<-high='not significant';
run;

ods exclude binomialtest;
proc freq data=pdni;
format pvalue pvone.;
tables pvalue / binomial;
run;
ods exclude none;

Result: Point estimate of the power: 0.9123 (exact 95%-CI: 0.9066, 0.9178).

For an equality test the required sample size would be much larger:

proc power;
twosamplefreq groupps=(0.97 0.95) alpha=0.05 power=0.9
test=pchi sides=2 npergroup=.;
run; /* --> npergroup=2016 */

The corresponding simulation is similar to the one above:

%let nsim=10000; /* number of simulated studies */
%let n=2016;     /* sample size per group */
%let p1=0.97;    /* success probability in group 1 */
%let p2=0.95;    /* success probability in group 2 */

/* Simulate the results of &NSIM studies */

data sim;
call streaminit(31415927);
do i=1 to &nsim;
  do grp=1, 2;
    success=1;
    k=rand('binom', choosen(grp, &p1, &p2), &n);
    output;
    success=0;
    k=&n-k;
    output;
  end;
end;
run;

/* Perform Pearson chi-square test for each study */

ods select none;
ods noresults;
ods output chisq=csq(where=(statistic='Chi-Square'));
proc freq data=sim;
by i;
weight k;
tables grp*success / chisq;
run;
ods select all;
ods results;

/* Determine the proportion of studies with a significant test result 
   (i.e., showing a significant difference between the success rates)
*/

proc format;
value pvtwo
low-0.05='significant'
0.05<-high='not significant';
run;

ods exclude binomialtest;
proc freq data=csq;
format prob pvtwo.;
tables prob / binomial;
run;
ods exclude none;

Result: Point estimate of the power: 0.9040 (exact 95%-CI: 0.8981, 0.9097).

So, in both cases the intended power of 0.9 was achieved.

And thank you for providing the link to documentation, but unfortunately, i don't understand most of the equations, i am not mathematician

The important parts at the top of the page are relatively simple. The actual sample size formulas are a bit complicated, but you don't need them as long as you trust PROC POWER. Also, if in doubt, you can ask me. I am a mathematician. 🙂

Skillside · Posted 10-12-2022 05:02 PM

Great to hear that, i regret that i am not on your level. I was pretty sure that I have some skills in sas, but after a discussion with you I feel like you brought me down on earth. Thanks.
So let's keep the optimistic variant, and let me keep the sample size for noninferiorityty. So at the end of the study what kind of test you would recommend using for testing the hypothesis that there is a difference between frequencies in groups? Chi-square assumes no difference, what is the alternative for that? and isn't it give a higher probability of committing a Beta error in that case?

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!