BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Teketo
Calcite | Level 5
Dears at SAS,
I was trying calculate sample size for a cluster randomized control trial which has two different intervention groups and one control group (totally three groups). Is there a different assumption in sample size calculation for multiple groups other than two population proportion or mean? Is Bonferroni correction the best assumption or simply shall I use the two population and distribute it for the three groups?

I was using a formula for cluster randomized controlled trail with unequal cluster size, however, I faced difficulties in getting ICC (rho) and Coefficient of variation (CV). I didn't get a paper citing ICC and coefficient of variation and even I couldn’t get figures which enable me to calculate these constants. I was trying to calculate the average cluster size using a fixed cluster number but when I did feasibility check, the assumption was not satisfied. Do you have some advice or recommendation?

Even different people say different; the published papers even didn’t have a uniform consensus. Some paper says as I should do a simulation to have a sample size with a good power other say different. I do have three outcome variables with count and binary outcome.

Could you support me how I do simulation to determine the required sample size for a cluster randomized controlled trial which has three groups? What steps should I follow on the SAS software to calculate or simulate sample size? Which program, under the installed application there are lots of options like SAS Enterprise Guide, SAS ILM Studio, etc, should I use?
 
With kind regards

Teketo Kassaw

1 ACCEPTED SOLUTION
25 REPLIES 25
Teketo
Calcite | Level 5
Hi,
I read it and the paper too but I didn't get information which is related related to my question. Could you give me a detailed description or a paper that could help me?
TeketoRegards
Teketo
Calcite | Level 5

Hi Damien,

Thank you. I read the mentioned paper and the responses. But I didn't get information which is directly related to my question. My question is about Sample size determination for a cluster randomized controlled trail which had three groups using SAS.

I have two types of outcomes; binary and count. The sample size I want to determine should take into consideration the following issues;

1. cluster number

2. cluster size

3. coefficient of variation

4. intracluster correlation coefficient / rho and

5. effect size

in addition to individual randomized controlled trila.

 

How can I determine the sample size for three groups; is the Bonferroni correction appropraite for it or is it possible two use the two population formula and then allocate for the three groups or is there any correction assumption other than this that SAS will consider?

Regards

Teketo

Damien_Mather
Lapis Lazuli | Level 10
** example precision estimation for a random control trial after  **;
** Stroup (2016) **;
**  see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar'  **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
 input block @@;
 ** each block is an independent replication of the 3-treatment **;
 ** (control + 2 new treatments) group experimental design **;
 ** in your case if you make each block a randomly selected district **;
 ** then you get to estimate inter-district response variance for 'free' **;
 ** the double trailling @ in the input statement holds the observation from being output **;
 do eu=1 to 3;
 ** iterates over each of the 3 experimental units in each (district?) block **;
 ** each eu (1-3) is a new experimental unit, which is permuted in this example **;
 ** the double trailing @ holds the observation from output until the end of dataline **;
  input trt @@; 
  ** current assumptions for success probabilities of the control (p1) and two **;
  ** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
  ** in previous simulations. By varying these and the eu size you can see what effect size difference **;
  **(here I use 13% or 0.13 diff) can be confidently detected at a given 95% level. **;
  ** I found this treatment effect size difference of 0.13 could just be detected at a 95 % C.L with a **;
  ** overall sample size of 545 by changing these probabilites and eu size and re-running the exemplar analysis **;
  p1=.31;p2=.44;p3=.50;
  ** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0  **;
  p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
  ** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
  ** rounding 10 x this number to an integer and adding to 10 will uniformly randomly **;
  ** generate (and therefore facilitate simulation of the impact of) experimental unit **;
  ** sample sizes in the range 10 - 20, mean=15. Using the same seed reproduces the same **;
  ** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
  ** to change the cluster size assumption change the 15 to something else. To change the size variation**;
  ** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
  ** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
  ** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally  **;
  ** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
  ** nuisance **;
  n=10+round(10*ranuni(09051958),1);
  ** mu is the expected number of positive outcomes from each experimental unit**;
  mu=round(n*p,1);
  ** and the simulated experimental outcome is output to the exemplar data set **;
  output;
  ** and this is done 3 times, one for each experimental unit **;
 end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
**  3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts). It is vitally important to vary treatments within block **;
** other designs that do not include this principle fail to have any useful precison **; 
cards;
1 1 2 3
2 1 3 2 
3 2 1 3
4 2 3 1
5 3 1 2 
6 3 2 1
7 1 2 3
8 1 3 2 
9 2 1 3
10 2 3 1
11 3 1 2 
12 3 2 1 
run; 
proc glimmix data=rct;
 class trt;
 ** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
 model mu/n=trt / ddfm=contain;
 random intercept trt / subject=block;
 ** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
 ** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
 ** parameters to the values given below **;
 parms (0.13)(0.10)/hold=1,2;
 ** this tests for strong evidence of a difference at the precision and sample size simulated **;
 lsmeans trt/diff cl;
run;
Number of Observations Read 36
Number of Observations Used 36
Number of Events 230
Number of Trials 545

 

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Intercept block 0.1300 .
trt block 0.1000 .

 

Differences of trt Least Squares Means
trt _trt Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
1 2 -0.5722 0.2552 22 -2.24 0.0353 0.05 -1.1015 -0.04299
1 3 -0.8914 0.2559 22 -3.48 0.0021 0.05 -1.4221 -0.3607
2 3 -0.3192 0.2479 22 -1.29 0.2113 0.05 -0.8332 0.1949

 

In this simulation the covariances sum to 0.23 as desired and the control-experiment effect difference precision is 0.13. An average  clinic sample size of 15 is sufficent with 3 treatments per district.

Teketo
Calcite | Level 5
I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.RegardsTeketo
Teketo
Calcite | Level 5

Hi Ksharp,

Tahnk you. I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.

Regards

Teketo

Damien_Mather
Lapis Lazuli | Level 10

It might help community members help you better if you clarified your design some more. Currently, on the information given, I have these clarifying questions n my own mind:

 

Does your proposed design have just 3 treatments, one of which is a control, such as a currently used treatment, and the other two are new treatments of interest?

Does you proposed design have just 3 cluster samples chosen at random from a larger population, such as samples of patients from 3 primary health care centres chosen from a population of several hundred health care centres?

Do you propose that the 3 treatments, including the control, are randomly assigned to the 3 clusters?

Is it that simple?

If that is the case, I can't see how the variations in response amongst cluster means is not confounded with the variations in response amongst the treatment means, which is a design problem that was addressed by Yates and Fisher in agriculture about 100 years ago.

Surely you plan to allocate each treatment to more than one cluster sample group? As a bare minimum, should you not be considering allocating the 3 treatments to a further 3 cluster samples, making 6 groups in total?

Teketo
Calcite | Level 5
Hi Damien
Thank you for your kindly support.
I will have three groups:Control group - will receive a currently used treatment
Intervention group 1 - will receive new treatment 1
Intervention group 2 - will receive new treatment 2
With regard to the nature of clusters: It will be a two stage cluster.

Stage one = districts will be randomly selected

Stage two = primary health care facilities within the selected districts will be sampled

Thus, all clients, who will fulfill the inclusion criteria of the study, coming for the specified health care service at the sampled health facility will be included.

The number of clusters; that is, the districts and primary health care facilities will not be three. For sure, it will be more than 6. This is my question that how many clusters should I need to have sample size with good power.

Therefore, my question is
1. How many clusters; that is, the total number of primary health care facilities, should I need to achieve a good power? How many     clients should be within each cluster/primary health care facility; that is average cluster size?
2. How can be sample size for a cluster randomized controlled trial be determined; taking ICC, coefficient of variation, effect size, a     different cluster size - since the number of clients per each sampled health facility will not be the equal?
3. How can I determine the sample size for these three groups? Is the sample size determination different from two group? How does     the Bonferroni correction  work here? Is there any formula for multiple group sample size determination?
4. How SAS do it or any other software taking my questions raised above; cluster randomized controlled trial plus three group?

RegardsTeketo

Damien_Mather
Lapis Lazuli | Level 10

That's better, but you do realise, dont you, that you've just now twice contradicted your earlier statement about only being interested in study design precision and not power?

 

It seems like you are you saying that sample size control is not at all possible, once a primary health care facility has been selected? That can't be right, If that were the case, why would you asking about how to determine sample sizes?

 

I know from my own experience that is is near impossible to manage studies so that treatment and block group sizes come 

out equal, but that should not impact on the experimental design stage, only the modelling stage. You should strive to obtain equal group sizes, and then do other things later on to deal with the group unbalance that you end up with, like eat ice cream (just joking) or use the proc glimmix model option ddfm=kr2 (not joking).

 

To be ethical for all stakeholders, frequent reporting on the current effective cluster sample sizes followed by timely advice to all primary healthcare participant recuiters when the quota is about to be met, so they can stop recruiting, would be best practice, right? Do y.ou plan to do this? This is not clear from your questions to date

 

Alternatively, do you have some idea of the different cluster sample sizes that will eventuate from the different primary health care cluster sample groups? Maybe an expected range of sizes?

 

If that is the case you can adapt the code example I gave to include individual sample groups drawn from, say, a uniform distribution over a range.

 

The code can easily be adaped to extend to more treatments than groups in a block, or more groups than treatments in each blocks, if that is what you are asking about.

 

Does any of this address your concerns? Do you need any more specific advice?

 

 

 

 

Teketo
Calcite | Level 5
Hi Damien,
Thank you indeed.

Could you givesome explanation or guide book about what the codes you used mean?  For example,
n=35+round(10*ranuni(09051958),1);mu=n*p;
What does 09051958mean?
parms (.07)(0.04)/hold=1,2

What does 0.07, 0.04 andhold=1,2 mean?
Could you provide me some moredetail advise about;

1. How can be the sample size forequal treatment group be determined? The three treatment groups will have equalratio: 1:1:1.     However, under each treatment I will have more than threeclusters, probably 12 clusters/primary health care facilities under each     treatmentgroup which will give me a total of 36 clusters; that is, 36 primary healthcare facilities, having varying cluster size. I do not     know, may be an averageof 50 or 60 samples/ participants per each cluster, which may give a total of36*50 = 1800 to 36*60     =2160 samples/participants; it is my assumption. I don'tknow whether this can be done on SAS or not.


2. Can the simulation give me theICC, coefficient of variation, effect size, the average cluster size and the likeused in the sample     size calculation or that I should enter? For example,if I have nine clusters/primary health care facilities in treatment 1, does itgive     me the average cluster size per the nine clusters/primary health carefacilities and for the rest of the treatment groups; that is, for     current treatmentand treatment 2 as well?


3. I am not clear with what blockingmean; is it about cluster? What makes it different from the treatment group?Does the     simulation you did considers the clustered nature of sampling? Forexample, what does “cluster sample sizes on 3 blocks of 3     treatment groups”mean? What does 3 block mean?

Regards

Teketo
Damien_Mather
Lapis Lazuli | Level 10
** example precision estimation for a random control trial after  **;
** Stroup (2016) **;
**  see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar'  **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
 input block @@;
 ** each block is an independent replication of the 3-treatment **;
 ** (control + 2 new treatments) group experimental design **;
 ** in your case if you make each block a randomly selected district **;
 ** then you get to estimate inter-district response variance for 'free' **;
 ** the double trailling @ in the input statement holds the observation from being output **;
 do eu=1 to 3;
 ** iterates over each of the 3 experimental units in each (district?) block **;
 ** each eu (1-3) is a new experimental unit, which is permuted in this example **;
 ** the double trailing @ holds the observation from output until the end of dataline **;
  input trt @@; 
  ** current assumptions for success probabilities of the control (p1) and two **;
  ** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
  ** in previous simulations. By varying these and the eu size you can see what effect size difference **;
  **(here I use 10% or 0.1 diff) can be confidently detected at a given level. Do you use 95% or 99%? **;
  ** I found this treatment effect size difference of 0.075 could just be detected at a 95 % C.L with a **;
  ** overall sample size of 1625 by changing these probabilites and eu size and re-running the exemplar analysis **;
  p1=.2;p2=.275;p3=.35;
  ** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0  **;
  p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
  ** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
  ** rounding 10 x this number to an integer and adding to 40 will uniformly randomly **;
  ** generate (and therefore facilitate simulation of the impact of) experimental unit **;
  ** sample sizes in the range 40 - 50. Using the same seed reproduces the same **;
  ** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
  ** to change the cluster size assumption change the 40 to something else. To change the size variation**;
  ** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
  ** own birthday or any other easy to remember number, or leave it blank for a new seed each time **;
  n=40+round(10*ranuni(09051958),1);
  ** mu is the expected number of positive outcomes from each experimental unit**;
  mu=n*p;
  ** and the simulated experimental outcome is output to the exemplar data set **;
  output;
  ** and this is done 3 times, one for each experimental unit **;
 end;
** in the datalines below I have simulated 12 districts (block) chosen at random each with **;
**  3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts in your case?) **; 
cards;
1 1 2 3
2 1 3 2 
3 2 1 3
4 2 3 1
5 3 1 2 
6 3 2 1
7 1 2 3
8 1 3 2 
9 2 1 3
10 2 3 1
11 3 1 2 
12 3 2 1 
run; 
proc glimmix data=rct;
 class trt;
 ** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
 model mu/n=trt ;
 random intercept trt / subject=block;
 ** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
 ** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
 ** parameters to the values given below **;
 parms (.08)(0.06)/hold=1,2;
 ** this tests for strong evidence of a difference at the precision and sample size simulated **;
 lsmeans trt/diff cl;
run;
Teketo
Calcite | Level 5
Hi Damien,

I really thank for your unreserved and kindly support. Let me ask some more detail thing;

1. Could there be any change on the following if I sampled one clinic per each sampled district? For example, if I randomly allocate only one treatment per district? How do the entire simulation look like?
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1


2. How could it look like, lets say if I used 95% CI, effect size of 0.13, P1=0.31, p2=0.44, p3=0.50, ICC=0.03, coefficient of variation = 0.5 variance=0.23, and average cluster size 60? Is there a possibility to enter the above option? What will the entire precision look like? An what will be the total sample size too?



3.Is this number necessarily be eight digit? 09051958

4. How could I simulate the power for the above assumption in addition to the precision and sample size?



Kind regards

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 25 replies
  • 9615 views
  • 10 likes
  • 3 in conversation