BookmarkSubscribeRSS Feed
BlueNose
Quartz | Level 8

Hello all,

I am trying to calculate and determinate the minimum required sample size for a study in which the events are rare.

I have this new treatment, which is coming to "compete" with an existing treatment with a failure rate of 10%. The inventor of the new treatment believe that the new treatment is better, and thus the failure rate is lower than 10% (he actually think it can be even 2%).

I need to set the right hypothesis and to calculate a sample size. The study will contain two paralleled groups, one of the new treatment and the control is the standard of care (the one with 10% failure). I thought to test the hypothesis that P1-P2=0 vs. P1-P2>0, where P1 is the failure proportion of the standard of care. The study is likely to be a multi-center study, but I have ignored that as I wasn't sure how to take this variance into account in sample size calculations.

I ran the following SAS code:

proc power;

  TwoSampleFreq Test = Fisher Dist = Exact_Cond Method=Walters

  Alpha = 0.05

  Sides = 1

  GroupProportions = (0.1 0.02) (0.1 0.03) (0.1 0.04) (0.1 0.05)

  Power = 0.8

  NPerGroup = .

run;

and these are my results:

Fisher's Exact Conditional Test for Two Proportions

  Fixed Scenario Elements

  Distribution Exact conditional

  Method Walters normal approximation

  Number of Sides 1

  Alpha 0.05

  Nominal Power 0.8

IndexProportion1Proportion2Actual PowerN per group
10.10.020.803123
20.10.030.802172
30.10.040.800247
40.10.050.800374

If I make the strict assumption that P2=0.02, I need 123*2=246 patients. This number is way too high in compare to the expectations of the researcher and the people who finance the study. I was asked to create magic. So, I was thinking maybe not to test for superiority, but to go for non-inferiority.

What else can I do to get a smaller sample size ? I read in some book, that by working with ratios or odds ratios, rather than with proportion differences, the sample size is lower. How is the Fisher exact test when it comes to sample size, is it the optimal test to use in this case ?

To revise my question, I am looking for a valid and correct way to calculate sample size, in a way that will give me the smallest possible study due to reasons that are beyond my control.

Thank you !

14 REPLIES 14
1zmm
Quartz | Level 8

If the inventor of the new treatment believes that this treatment is BETTER than the existing treatment, using the formulas for noninferiority won't help you because the alternative hypothesis would be that the new treatment is superior to the existing treatment, not merely equivalent to it.  The most important ways you can reduce the sample size are to reduce the statistical power, to increase the statistical significance level, alpha, and to reduce the expected failure rate in those receiving the new treatment.  If you can follow your groups for a longer interval, then the sample size may be smaller if the failure rate in the existing treatment group increases somewhat more rapidly than that in the new treatment group.  If you envision the treatment comparison as a clinical trial with follow-up so that you identify failures as they occur rather than only at the end of follow-up, you might be able to compare the treatments as in a comparative survival analysis, which should increase statistical power and reduce the sample size.

BlueNose
Quartz | Level 8

thank you for replying.

The failure proportion of the existing treatment is said be to "higher than 10%", that's why I assumed it is 10%, it's the only valid assumption I could make. In practice it can be higher (resulting with a lower needed sample size).

As for the significance level you mentioned, or the probability for a type I error, do you think that the FDA for instance is accepting a higher value than 0.05, say 0.07 or 0.1 ?

I ran the analysis again, and got for example this result:

Proportions: Inequality, two independent groups (Fisher's exact test)

Input:             Tail(s)                                                    =   One

                         Proportion p1                                      =   0.1

                         Proportion p2                                      =   0.02

                         α                                                            =   0.05

                         Power                                                    =   0.8

Output:         Sample size per  group                     =   124

                         Actual power                                       =   0.8019416

                         Actual α                                                =   0.0114122


The "nominal" α value is different than the empirical α value (have I said it correctly ?). If the "actual α" is 0.011, can I choose a higher level of α, say 0.1, which still lead to an "actual α" lower than 0.05, and "get away with it" when I face a body like the FDA ?


As for superiority vs. non inferiority, unfortunately, the researcher does want to show superiority, but the budget is limited and the study is only a pilot one, so showing non-inferiority will be sufficient if there is no other option. I do not think the financing body will be pleased with more than 100 per group, and that's what I am trying to achieve while still keeping the analysis valid enough. I am looking for ways to reduce the sample size, but only for valid ways that will still lead to the true results (either for or against the new treatment)

SteveDenham
Jade | Level 19

Would the financing body be willing to trade length of study for number enrolled?  Follow 's advice, and consider a comparative survival analysis.  Even a truncated/censored analysis of a continuous variable should require lower N per group than a strict dichotomous variable, at the same power level.

Steve Denham

1zmm
Quartz | Level 8

I don't know the FDA's rules, but FDA probably won't allow you to increase your statistical significance level, alpha, to exceed 0.05 because that's the usually accepted standard.  If the financing body will pay for only 100 study subjects per group with your original study design, keep alpha [=0.05, two-sided] and the number of study subjects per group [=100] constant and vary either the proportion expected to fail in the treated group [=p2, as you did above], the proportion expected to fail in the standard care group [=p1], the statistical power

[=1.00 - beta], or all three to provide different scenarios to the inventor of the new treatment and the financing body at a prespecified alpha and number of study subjects per group.  Let them decide which scenario based on these criteria is most appropriate.

BlueNose
Quartz | Level 8

Thank you both for helping.

Sorry about the long time for response, I had to think about it and get some details, but going back to Mathew's first idea, of using survival analysis, the failure / success status is to be decided after 6 months. I do not know if the status is being checked earlier yet, it depends on the doctors and so on, but let's assume I manage to convince them, and it will be checked every month up to 6 months. How do I calculate sample size for this scenario ? I do not know how to do it with SAS. What I did do, I ran the analysis with some other secondary tool I got, and I got a sample size of 192. Clearly, it is not possible since it's larger, and I must have done something wrong, probably did not understand one or more of the parameters needed, and there were quite a lot of them. Do you guys know how to find the sample size in a relatively simple way, if possible using SAS ?

(although if the process is not iterative I could calculate by hand given a formula)

thanks again !

1zmm
Quartz | Level 8

PROC POWER in SAS version 9.2 or later has an option, TWOSAMPLESURVIVAL, that should be able to calculate sample size for you.  The documentation for PROC POWER includes an example for comparing two survival curves.

I split the observation time into six single monthly intervals for the standard treatment group and six single monthly intervals for the new treatment group.  Then I calculated the estimated failure rates at each of these monthly intervals based on overall values of p1=0.10 for the standard treatment group (decrements from 1.00 to 0.90 over 6 intervals = 0.10/6 = 0.01667 per month) and of p2=0.02 for the new treatment group (decrements form 1.00 to 0.98 over 6 intervals = 0.02/6 = 0.00333 per month).  The followup time = 6, and the total time = 6 (assuming that all the patients would be available in both groups at the beginning of the study).  I used the LOGRANK test that assumes that failures occur relatively evenly across follow-up (TEST=LOGRANK); a one-sided test (SIDES=1); a statistical significance level, alpha, of 0.05 [the default]; and a statistical power of at least 0.80.  The sample size for each group was estimated as 110, which is smaller than 124, but still not as small as the required 100 per group.

Note that the monthly decrements are probably unrealistic, especially for the new treatment group, because failures/adverse effects will probably not occur at the monthly follow-up examinations.  If some of the patients are not available at the start of the study, you may have to use an accrual time instead of just a follow-up time as above.  Changes in either of these options probably will affect the sample size estimates.  Finally, multiple interim follow-up examinations may add more to the study cost than adding more patients above 100 per group.  That's why you should provide the study investigators and the funding group multiple scenarios for them to choose from.


BlueNose
Quartz | Level 8

thank you for the calculation, I will try to "do it myself" when I get to the office where I have my SAS installed.

In the meanwhile, I was "playing" without SAS. I found a reference to a method called the Schoenfeld's Method. I did a manual calculation:

r=ln(.98)/ln(.9)=0.1917

n1=n2=((.84+1.645)/ln(.1917483716))^2*(1/(1-.9)+1/(1-.98))=135.834 ~ 136

not good enough.

Then I ran again the previous calculation I did with this other tool (which I can't report, I will report only SAS since it is 100% valid by FDA or any other regulatory body), with a slight modification, and this is what I got:

A one-sided logrank test with an overall sample size of 182 subjects (91 in the control group, and 91 in the treatment group) achieves 0.8 power at a 0.05 significance level to detect a

hazard ratio of 0.1917 when the proportion surviving in the control group is 0.9. The study lasts for 6 time periods with all subjects beginning the study together. No subjects drop out of the control group. No subjects drop out of the treatment group.

this is not the first time I find myself in this situation, when I have more than one possible formula. In the reference where I found the Schoenfeld's Method there was another method called Lachin's method, which is more conservative. How one suppose to choose the right number ?

1zmm
Quartz | Level 8

The SAS method for the two-sample survival analysis using the log-rank method is described in the PROC POWER documentation under the computational methods for the TWOSAMPLESURVIVAL statement.  This documentation cites a 1988 reference by Lakatos, et al. and a 1997 SAS book by Cantor.

Does the other tool you use cite a reference that is the source of its formula?  Unless it does, the other tool may be problematic.

From what I've been able to read, the discrepancy between the sample sizes you've calculated is much larger that what would be expected.  Until you can compare these formulas and the assumptions underlying these formulas, I'd be careful in choosing one over another.

BlueNose
Quartz | Level 8

I have checked the formulas in all tools, the tool I was talking about had a reference to Freedman and Lakatos.

As it happens, I got another sample size assignment to handle, this is very suitable to survival analysis, so I am in "double trouble" now. I have a new treatment again, and a study that compares it to a control. The response variable is dichotomous, meaning failure vs. success. This is an eye treatment, and as I understand it, the study will have follow ups to observe the time until the problem comes back, if at all.

My follow up times are 1 day, 1 week, 1 month, 3 months and 6 months. The analysis is suppose to be places using the Log Rank test, as before. The power should be 80% and the significance level 5%. There is no chance for someone to start being followed up at 1 month, everyone will be checked after 1 day, so we can say the entire sample is starting the same day. As for proportions, I am not sure yet, but I can try several options. Let's say, just for the example, that the proportion of success in the control group is known to be 75%, and I want to be able to show that the new treatment is better with proportion of 85%.

1. Is it correct that the hazard ration is ln(0.85)/ln(0.75) ?

2. I didn't manage to re-perform what you did on SAS, can you help me please to set up the code ? I find it mostly hard to understand the meaning of all the parameters like accrualtime, followuptime and the others, there quite a lot of them actually. I believe that with 1 example I will catch the idea and will be able to do it again with different parameters, and also be able to go back to the previous problem I had and apply it there.

3. If the plan is to make a multi-center study, should I take it somehow into account in sample size calculations ?

Thank you very much, your help is MOST appreciated !

1zmm
Quartz | Level 8

The hazard ratio is the ratio of the proportion of failures in the treatment group to the proportion of failures in the comparison group:     0.15/0.25  = 0.60.

The accrual time is the time needed to recruit study subjects for the trial if all those needed are not available at the start of the trial.  The follow-up time is the time that the trial study subjects are followed after the start of the trial either to the end of the trial, to the time that they fail, to the time they are lost-to-follow-up, or to the time they withdraw from the trial.  The total time is the sum of the accrual time and the follow-up time.

One SAS code specification to calculate the sample size in each group for the two-group six-month trial you propose above with a two-sided significance level, alpha [=0.05], and a statistical power of 0.80 is the following:

     proc power;

         twosamplesurvival

             test=logrank

             alpha=0.05

             sides=2

             curve("Standard")=6 : 0.75

             curve("Proposed")=6 : 0.85

              groupsurvival="Standard" | "Proposed"

              followuptime=6

              totaltime=6

              power=0.80

              npergroup=.;

run;

This calculation assumes 100% follow-up and ascertainment of all relevant outcomes.

In a multi-center study where the study subjects do not differ across the centers, the follow-up time and the total time are the same, and the outcome results will be aggregated across the centers, the sample size calculations would be the same.

BlueNose
Quartz | Level 8

thank you !

I ran this code:

proc power;

         twosamplesurvival

             test=logrank

             alpha=0.05

             sides=1

             curve("Standard")=4 : 0.70

             curve("Proposed")=4 : 0.90

              groupsurvival="Standard" | "Proposed"

              followuptime=4

              totaltime=4

              power=0.8

              npergroup=.;

run;

I changed the number 6 to 4, since I found out that there are only 4 followup times, however, changing from 6 to 4 did not change the sample size.

I also changed to 1 sided test, in order to reduce the sample size, knowing that it shows superiority.

one thing that does bother me in the output is:


The POWER Procedure

Log-Rank Test for Two Survival Curves

  

MethodLakatos normal approximation
Number of Sides1
Follow-up Time4
Total Time4
Alpha0.05
Group 1 Survival CurveStandard
Form of Survival Curve 1Exponential
Group 2 Survival CurveProposed
Form of Survival Curve 2Exponential
Nominal Power0.8
Number of Time Sub-Intervals12
Group 1 Loss Exponential Hazard0
Group 2 Loss Exponential Hazard0

   power n

      0.80750

isn't the line in red means that we are talking about years (4 years) ? if I talk about months, shouldn't it be 4 (weeks) ?

1zmm
Quartz | Level 8

The number of time subintervals is used in the calculations of sample size when the pattern of follow-up times is complex.  This is not the situation in your example because both survival curves have exponential distributions (constant hazards).  If you change the number of these subintervals in your example using the option, NSUBINTERVALS, the estimated sample size or the statististical power does not markedly change.  The default number of subintervals does not refer to a specific unit of time, whether weeks, months, or years.

Although you have specified a one-sided statistical test assuming that the new treatment is better than the standard comparison treatment, the FDA may require you to calculate the statistical power or the sample size based on a two-sided test because a clinical trial wouldn't be needed if you knew that the new treatment was always better than the standard treatment.  The new treatment may have unexpected side effects that make it hard for the patient to continue taking it and may in fact be worse than the standard treatment,

In this reply, I'd also like to correct an error I made in a previous reply in defining what a hazard ratio is.  A hazard ratio is the ratio of the rate of new failures in a new treatment group to the rate of new failures in a standard comparison group.  The ratio is NOT of proportions but of rates that have a denominator of person-time (of follow-up):   Rate = (number of new failures in a group) / (sum of person-time follow-up across all members of the group).

Rates and follow-up time are relevant to follow-up studies like clinical trials because the expected number of new failures used in calculating sample sizes would increase with increasing duration of follow-up.  The hazard ratio also refers to NEW (incident) failures and excludes already existing (prevalent) failures.  In your specification of characteristics for calculating statistical power in PROC POWER, these distinctions don't matter because these specifications imply similar follow-up in both the new treatment group and the standard treatment group:   Both groups are followed for the same duration, no one from either group is lost to follow-up or withdraws from follow-up, and all failures are ascertained equally in both groups.

BlueNose
Quartz | Level 8

thank you !

with the one sided sample size I got n=50 per group, which is not that much. with 2 sided I get 63.

one last question, in most references I find online, the formulas require the hazard ratio. following your latest correction, how would you calculate it if the proportions are 0.9 and 0.7 (treatment and control respectively) ?

I also read often that formulas require median time, how does one suppose to know the median prior the study ?

1zmm
Quartz | Level 8

In your previous e-mails, you described proportions like 0.9 and 0.7 for the treatment and the control groups respectively as the proportions of SUCCESSES over a specific follow-up time, not the proportion of FAILURES over that time.  I would thus characterize the hazard rate of (new) FAILURES for the control group as 0.30/sum of follow-up time [=1.00-0.70] and the hazard rate of (new) FAILURES for the treatment group as 0.10/sum of follow-up time [=1.00-0.90].  The hazard ratio is the ratio of the hazard rate for the treatment group relative to the hazard rate for the control group:

      hazard ratio = (0.10/sum of follow-up time) / (0.30/sum of follow-up time)  = 0.333333....

With respect to median survival time for a group, you can get that information from the published literature on previous clinical trials using the treatments of interest, or you can estimate it as the time when half of the group has "survived" [=not yet suffered a new failure] or its equivalent, the time when half of the group has "failed".  Since the form of your survival curves is exponential, you can estimate these median survival times from the proportion surviving over the follow-up time.  Using your example:

        How long an interval would it be when the probability of survival = 0.5, given that probability of survival for treated group = 0.9

          at 4 weeks/months/years?

                        0.5 = (0.9)^x  -->   ln(0.5) = x ln(0.9)   --->   x = ln(0.5)/ln(0.9)  --> 6.58.

        Thus, the estimated median survival time equals 6.58 times the duration it took to reach the probability of survival of 0.9 [=4 weeks/months/years] --> 26.3 weeks/months/years.

You can estimate the median survival time for the control group in a similar way.


SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 8213 views
  • 6 likes
  • 3 in conversation