topic Re: Sample size calculation for multiple groups and a cluster randomized controlled trail in Statistical Procedures

Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Fri, 09 Dec 2016 01:01:12 GMT

Dears at SAS,

I was trying calculate sample size for a cluster randomized control trial which has two different intervention groups and one control group (totally three groups). Is there a different assumption in sample size calculation for multiple groups other than two population proportion or mean? Is Bonferroni correction the best assumption or simply shall I use the two population and distribute it for the three groups?

I was using a formula for cluster randomized controlled trail with unequal cluster size, however, I faced difficulties in getting ICC (rho) and Coefficient of variation (CV). I didn't get a paper citing ICC and coefficient of variation and even I couldn’t get figures which enable me to calculate these constants. I was trying to calculate the average cluster size using a fixed cluster number but when I did feasibility check, the assumption was not satisfied. Do you have some advice or recommendation?

Even different people say different; the published papers even didn’t have a uniform consensus. Some paper says as I should do a simulation to have a sample size with a good power other say different. I do have three outcome variables with count and binary outcome.

Could you support me how I do simulation to determine the required sample size for a cluster randomized controlled trial which has three groups? What steps should I follow on the SAS software to calculate or simulate sample size? Which program, under the installed application there are lots of options like SAS Enterprise Guide, SAS ILM Studio, etc, should I use?

With kind regards

Teketo Kassaw

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Fri, 09 Dec 2016 02:19:30 GMT

see the paper suggested in this other thread:

https://communities.sas.com/t5/SAS-Statistical-Procedures/Sample-size-calculation-for-proportion-repeated-measures/m-p/317603#U317603

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Fri, 09 Dec 2016 02:56:23 GMT

Hi,
I read it and the paper too but I didn't get information which is related related to my question. Could you give me a detailed description or a paper that could help me?
TeketoRegards

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Fri, 09 Dec 2016 03:08:57 GMT

Hi Damien,

Thank you. I read the mentioned paper and the responses. But I didn't get information which is directly related to my question. My question is about Sample size determination for a cluster randomized controlled trail which had three groups using SAS.

I have two types of outcomes; binary and count. The sample size I want to determine should take into consideration the following issues;

1. cluster number

2. cluster size

3. coefficient of variation

4. intracluster correlation coefficient / rho and

5. effect size

in addition to individual randomized controlled trila.

How can I determine the sample size for three groups; is the Bonferroni correction appropraite for it or is it possible two use the two population formula and then allocate for the three groups or is there any correction assumption other than this that SAS will consider?

Regards

Teketo

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Ksharp — Fri, 09 Dec 2016 03:15:58 GMT

For other statistical model which proc power is unable to support , you can use simulate data to get it.

http://blogs.sas.com/content/iml/2013/05/30/simulation-power.html

http://blogs.sas.com/content/iml/2013/06/05/simulation-power-curve.html

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Fri, 09 Dec 2016 03:31:23 GMT

I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.RegardsTeketo

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Fri, 09 Dec 2016 03:34:04 GMT

Hi Ksharp,

Tahnk you. I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.

Regards

Teketo

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sat, 10 Dec 2016 02:27:51 GMT

It might help community members help you better if you clarified your design some more. Currently, on the information given, I have these clarifying questions n my own mind:

Does your proposed design have just 3 treatments, one of which is a control, such as a currently used treatment, and the other two are new treatments of interest?

Does you proposed design have just 3 cluster samples chosen at random from a larger population, such as samples of patients from 3 primary health care centres chosen from a population of several hundred health care centres?

Do you propose that the 3 treatments, including the control, are randomly assigned to the 3 clusters?

Is it that simple?

If that is the case, I can't see how the variations in response amongst cluster means is not confounded with the variations in response amongst the treatment means, which is a design problem that was addressed by Yates and Fisher in agriculture about 100 years ago.

Surely you plan to allocate each treatment to more than one cluster sample group? As a bare minimum, should you not be considering allocating the 3 treatments to a further 3 cluster samples, making 6 groups in total?

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Sat, 10 Dec 2016 03:41:06 GMT

Hi Damien
Thank you for your kindly support.
I will have three groups:Control group - will receive a currently used treatment
Intervention group 1 - will receive new treatment 1
Intervention group 2 - will receive new treatment 2
With regard to the nature of clusters: It will be a two stage cluster.

Stage one = districts will be randomly selected

Stage two = primary health care facilities within the selected districts will be sampled

Thus, all clients, who will fulfill the inclusion criteria of the study, coming for the specified health care service at the sampled health facility will be included.

The number of clusters; that is, the districts and primary health care facilities will not be three. For sure, it will be more than 6. This is my question that how many clusters should I need to have sample size with good power.

Therefore, my question is
1. How many clusters; that is, the total number of primary health care facilities, should I need to achieve a good power? How many     clients should be within each cluster/primary health care facility; that is average cluster size?
2. How can be sample size for a cluster randomized controlled trial be determined; taking ICC, coefficient of variation, effect size, a     different cluster size - since the number of clients per each sampled health facility will not be the equal?
3. How can I determine the sample size for these three groups? Is the sample size determination different from two group? How does     the Bonferroni correction work here? Is there any formula for multiple group sample size determination?
4. How SAS do it or any other software taking my questions raised above; cluster randomized controlled trial plus three group?

RegardsTeketo

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sat, 10 Dec 2016 04:31:12 GMT

That's better, but you do realise, dont you, that you've just now twice contradicted your earlier statement about only being interested in study design precision and not power?

It seems like you are you saying that sample size control is not at all possible, once a primary health care facility has been selected? That can't be right, If that were the case, why would you asking about how to determine sample sizes?

I know from my own experience that is is near impossible to manage studies so that treatment and block group sizes come

out equal, but that should not impact on the experimental design stage, only the modelling stage. You should strive to obtain equal group sizes, and then do other things later on to deal with the group unbalance that you end up with, like eat ice cream (just joking) or use the proc glimmix model option ddfm=kr2 (not joking).

To be ethical for all stakeholders, frequent reporting on the current effective cluster sample sizes followed by timely advice to all primary healthcare participant recuiters when the quota is about to be met, so they can stop recruiting, would be best practice, right? Do y.ou plan to do this? This is not clear from your questions to date

Alternatively, do you have some idea of the different cluster sample sizes that will eventuate from the different primary health care cluster sample groups? Maybe an expected range of sizes?

If that is the case you can adapt the code example I gave to include individual sample groups drawn from, say, a uniform distribution over a range.

The code can easily be adaped to extend to more treatments than groups in a block, or more groups than treatments in each blocks, if that is what you are asking about.

Does any of this address your concerns? Do you need any more specific advice?

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sun, 11 Dec 2016 02:19:32 GMT

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Sat, 10 Dec 2016 11:14:19 GMT

Hi Damien,
Thank you indeed.

Could you givesome explanation or guide book about what the codes you used mean? For example,
n=35+round(10*ranuni(09051958),1);mu=n*p;
What does 09051958mean?
parms (.07)(0.04)/hold=1,2

What does 0.07, 0.04 andhold=1,2 mean?
Could you provide me some moredetail advise about;

1. How can be the sample size forequal treatment group be determined? The three treatment groups will have equalratio: 1:1:1.     However, under each treatment I will have more than threeclusters, probably 12 clusters/primary health care facilities under each     treatmentgroup which will give me a total of 36 clusters; that is, 36 primary healthcare facilities, having varying cluster size. I do not     know, may be an averageof 50 or 60 samples/ participants per each cluster, which may give a total of36*50 = 1800 to 36*60     =2160 samples/participants; it is my assumption. I don'tknow whether this can be done on SAS or not.

2. Can the simulation give me theICC, coefficient of variation, effect size, the average cluster size and the likeused in the sample     size calculation or that I should enter? For example,if I have nine clusters/primary health care facilities in treatment 1, does itgive     me the average cluster size per the nine clusters/primary health carefacilities and for the rest of the treatment groups; that is, for     current treatmentand treatment 2 as well?

3. I am not clear with what blockingmean; is it about cluster? What makes it different from the treatment group?Does the     simulation you did considers the clustered nature of sampling? Forexample, what does “cluster sample sizes on 3 blocks of 3     treatment groups”mean? What does 3 block mean?

Regards

Teketo

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sat, 10 Dec 2016 12:43:29 GMT

** example precision estimation for a random control trial after  **;
** Stroup (2016) **;
**  see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar'  **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
 input block @@;
 ** each block is an independent replication of the 3-treatment **;
 ** (control + 2 new treatments) group experimental design **;
 ** in your case if you make each block a randomly selected district **;
 ** then you get to estimate inter-district response variance for 'free' **;
 ** the double trailling @ in the input statement holds the observation from being output **;
 do eu=1 to 3;
 ** iterates over each of the 3 experimental units in each (district?) block **;
 ** each eu (1-3) is a new experimental unit, which is permuted in this example **;
 ** the double trailing @ holds the observation from output until the end of dataline **;
  input trt @@; 
  ** current assumptions for success probabilities of the control (p1) and two **;
  ** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
  ** in previous simulations. By varying these and the eu size you can see what effect size difference **;
  **(here I use 10% or 0.1 diff) can be confidently detected at a given level. Do you use 95% or 99%? **;
  ** I found this treatment effect size difference of 0.075 could just be detected at a 95 % C.L with a **;
  ** overall sample size of 1625 by changing these probabilites and eu size and re-running the exemplar analysis **;
  p1=.2;p2=.275;p3=.35;
  ** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0  **;
  p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
  ** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
  ** rounding 10 x this number to an integer and adding to 40 will uniformly randomly **;
  ** generate (and therefore facilitate simulation of the impact of) experimental unit **;
  ** sample sizes in the range 40 - 50. Using the same seed reproduces the same **;
  ** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
  ** to change the cluster size assumption change the 40 to something else. To change the size variation**;
  ** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
  ** own birthday or any other easy to remember number, or leave it blank for a new seed each time **;
  n=40+round(10*ranuni(09051958),1);
  ** mu is the expected number of positive outcomes from each experimental unit**;
  mu=n*p;
  ** and the simulated experimental outcome is output to the exemplar data set **;
  output;
  ** and this is done 3 times, one for each experimental unit **;
 end;
** in the datalines below I have simulated 12 districts (block) chosen at random each with **;
**  3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts in your case?) **; 
cards;
1 1 2 3
2 1 3 2 
3 2 1 3
4 2 3 1
5 3 1 2 
6 3 2 1
7 1 2 3
8 1 3 2 
9 2 1 3
10 2 3 1
11 3 1 2 
12 3 2 1 
run; 
proc glimmix data=rct;
 class trt;
 ** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
 model mu/n=trt ;
 random intercept trt / subject=block;
 ** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
 ** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
 ** parameters to the values given below **;
 parms (.08)(0.06)/hold=1,2;
 ** this tests for strong evidence of a difference at the precision and sample size simulated **;
 lsmeans trt/diff cl;
run;

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Sat, 10 Dec 2016 21:22:06 GMT

Hi Damien,

I really thank for your unreserved and kindly support. Let me ask some more detail thing;

1. Could there be any change on the following if I sampled one clinic per each sampled district? For example, if I randomly allocate only one treatment per district? How do the entire simulation look like?
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1

2. How could it look like, lets say if I used 95% CI, effect size of 0.13, P1=0.31, p2=0.44, p3=0.50, ICC=0.03, coefficient of variation = 0.5 variance=0.23, and average cluster size 60? Is there a possibility to enter the above option? What will the entire precision look like? An what will be the total sample size too?

3.Is this number necessarily be eight digit? 09051958

4. How could I simulate the power for the above assumption in addition to the precision and sample size?

Kind regards

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sun, 11 Dec 2016 00:08:09 GMT

Hi Teketo.

1. Yes, in theory you could allocate one treatment per block (district) but you would confound treatment effect with district effect more that way. Also, the same unknown, but finite, risk of a district failure to comply and/or report data has a much bigger impact on final design imbalance if you design that way. I would assume that you would have to have some extremely compelling reason to ask about this. Can you share that with us?

2.(a) I inferr from your question that you are only interested in the sample size impact of an effect difference precision of 0.13 between the control and the two experimental treatments and not between the two experimental treatments. Is that correct? (b) When you ask about a specific intraclass correlation, which class do you mean? block (district), experimental unit (clinic) or is it some class you have not yet mentioned, such as physician within (and sometimes between) clinics (and/or districts)? (c) you ask about simulating a particular coefficient of variation, but the appropriate related statistic for binomial and other distributed count data models such as this the index of dispersion. Is that what you mean? (d) When you ask about simulating the impact of a variance of 0.23, I assume you mean a constraint on the total variance of 0.23. With your design, you have to make separate assumptions about the block and treatment variance, so I guess you want to constrain both of those to add to 0.23. However you and possily your colleagues will be the ones that need to think about how to partition that into block and treatment variance, using the method given in the second paragraph of section 4 of Stroup's 2016 paper as I suggested in an earlier post.

3. There is no effective restriction about random seed size in SAS random number functions. I use 09051958 as it is an easy number to remember for me but you could use any number.

4. I do not see the value in answering this until the questions above have been clarified.

Cheers.

Damien

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sun, 11 Dec 2016 02:03:43 GMT

** example precision estimation for a random control trial after  **;
** Stroup (2016) **;
**  see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar'  **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
 input block @@;
 ** each block is an independent replication of the 3-treatment **;
 ** (control + 2 new treatments) group experimental design **;
 ** in your case if you make each block a randomly selected district **;
 ** then you get to estimate inter-district response variance for 'free' **;
 ** the double trailling @ in the input statement holds the observation from being output **;
 do eu=1 to 3;
 ** iterates over each of the 3 experimental units in each (district?) block **;
 ** each eu (1-3) is a new experimental unit, which is permuted in this example **;
 ** the double trailing @ holds the observation from output until the end of dataline **;
  input trt @@; 
  ** current assumptions for success probabilities of the control (p1) and two **;
  ** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
  ** in previous simulations. By varying these and the eu size you can see what effect size difference **;
  **(here I use 13% or 0.13 diff) can be confidently detected at a given 95% level. **;
  ** I found this treatment effect size difference of 0.13 could just be detected at a 95 % C.L with a **;
  ** overall sample size of 545 by changing these probabilites and eu size and re-running the exemplar analysis **;
  p1=.31;p2=.44;p3=.50;
  ** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0  **;
  p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
  ** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
  ** rounding 10 x this number to an integer and adding to 10 will uniformly randomly **;
  ** generate (and therefore facilitate simulation of the impact of) experimental unit **;
  ** sample sizes in the range 10 - 20, mean=15. Using the same seed reproduces the same **;
  ** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
  ** to change the cluster size assumption change the 15 to something else. To change the size variation**;
  ** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
  ** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
  ** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally  **;
  ** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
  ** nuisance **;
  n=10+round(10*ranuni(09051958),1);
  ** mu is the expected number of positive outcomes from each experimental unit**;
  mu=round(n*p,1);
  ** and the simulated experimental outcome is output to the exemplar data set **;
  output;
  ** and this is done 3 times, one for each experimental unit **;
 end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
**  3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts). It is vitally important to vary treatments within block **;
** other designs that do not include this principle fail to have any useful precison **; 
cards;
1 1 2 3
2 1 3 2 
3 2 1 3
4 2 3 1
5 3 1 2 
6 3 2 1
7 1 2 3
8 1 3 2 
9 2 1 3
10 2 3 1
11 3 1 2 
12 3 2 1 
run; 
proc glimmix data=rct;
 class trt;
 ** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
 model mu/n=trt / ddfm=contain;
 random intercept trt / subject=block;
 ** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
 ** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
 ** parameters to the values given below **;
 parms (0.13)(0.10)/hold=1,2;
 ** this tests for strong evidence of a difference at the precision and sample size simulated **;
 lsmeans trt/diff cl;
run;

Number of Observations Read	36
Number of Observations Used	36
Number of Events	230
Number of Trials	545

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	block	0.1300	.
trt	block	0.1000	.

Differences of trt Least Squares Means
trt	_trt	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
1	2	-0.5722	0.2552	22	-2.24	0.0353	0.05	-1.1015	-0.04299
1	3	-0.8914	0.2559	22	-3.48	0.0021	0.05	-1.4221	-0.3607
2	3	-0.3192	0.2479	22	-1.29	0.2113	0.05	-0.8332	0.1949

In this simulation the covariances sum to 0.23 as desired and the control-experiment effect difference precision is 0.13. An average clinic sample size of 15 is sufficent with 3 treatments per district.

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sun, 11 Dec 2016 02:59:52 GMT

** example precision estimation for a random control trial after  **;
** Stroup (2016) **;
**  see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar'  **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
 input block @@;
 ** each block is an independent replication of the 3-treatment **;
 ** (control + 2 new treatments) group experimental design **;
 ** in your case if you make each block a randomly selected district **;
 ** then you get to estimate inter-district response variance for 'free' **;
 ** the double trailling @ in the input statement holds the observation from being output **;
 do eu=1 to 3;
 ** iterates over each of the 3 experimental units in each (district?) block **;
 ** each eu (1-3) is a new experimental unit, which is permuted in this example **;
 ** the double trailing @ holds the observation from output until the end of dataline **;
  input trt @@; 
  ** current assumptions for success probabilities of the control (p1) and two **;
  ** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
  ** in previous simulations. By varying these and the eu size you can see what effect size difference **;
  **(here I use 13% or 0.13 diff) can be confidently detected at a given 95% level. **;
  ** I found this treatment effect size difference of 0.13 between the control treatment and the two **;
  ** experimental treatments could just be detected at a 95 % C.L with overall sample size of 1265 **;
  ** by changing these probabilites and eu size and re-running the exemplar analysis **;
  p1=.31;p2=.44;p3=.50;
  ** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0  **;
  p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
  ** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
  ** rounding 10 x this number to an integer and adding to 45 will uniformly randomly **;
  ** generate (and therefore facilitate simulation of the impact of) experimental unit **;
  ** sample sizes in the range 50 - 60, mean=55. Using the same seed reproduces the same **;
  ** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
  ** to change the cluster size assumption change the n=N to something else. To change the size variation**;
  ** change the round(M* to something else. To use your own 'lucky' random seed change 09051958 to your **;
  ** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
  ** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally  **;
  ** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
  ** nuisance **;
  n=45+round(10*ranuni(09051958),1);
  ** mu is the expected number of positive outcomes from each experimental unit, rounded to an integer **;
  mu=round(n*p,1);
  ** and the simulated experimental outcome is output to the exemplar data set **;
  output;
  ** and this is done 3 times, one for each experimental unit **;
 end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
**  3 clinics chosen at random for the trial. Here, treatments are allocated 2 to a district in **;
** all possible permutation orders, balanced over the 12 blocks (districts). It is vitally important **;
** to vary treatments within blocks. other designs that do not include this principle fail to have **;
** any useful precison **; 
cards;
1 1 2 1
2 2 1 2 
3 1 3 1
4 3 1 3
5 2 3 2 
6 3 2 3
7 1 2 1
8 2 1 2 
9 1 3 1
10 3 1 3
11 2 3 2 
12 3 2 3 
run; 
proc glimmix data=rct;
 class trt;
 ** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
 model mu/n=trt / ddfm=contain;
 random intercept trt / subject=block ;
 ** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
 ** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
 ** parameters to the values given below. These sum to 0.23 as desired. **;
 parms (0.13)(0.10)/hold=1,2;
 ** this tests for strong evidence of a difference between control and 2 experimental treatments **;
 ** at the precision (0.13) and sample size (1805) simulated. The appropriate adjustment is Dunnet-Hsu **;
 ** not Bonferroni for multiple adjustments when only comparing multiple treatments to a control **;
 lsmeans trt/diff cl adjust=dunnett;
run;

Number of Observations Read	36
Number of Observations Used	36
Number of Events	754
Number of Trials	1805

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	block	0.1300	.
trt	block	0.1000	.

Differences of trt Least Squares Means Adjustment for Multiple Comparisons: Dunnett-Hsu
trt	_trt	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Adj P	Alpha	Lower	Upper	Adj Lower	Adj Upper
2	1	0.5661	0.2201	10	2.57	0.0278	0.0495	0.05	0.07577	1.0565	0.001399	1.1309
3	1	0.8295	0.2198	10	3.77	0.0036	0.0067	0.05	0.3398	1.3193	0.2655	1.3935

This simulation only just meets your 0.13 precision criteria after the appropriate Dunnett-Hsu corrections for multiple comparisons to a single control are applied assuming a design that allocates only 2 of the 3 treatments to each district of 3 clinics in a balanced design over 12 districts and a total linearised variance of 0.23. Mean adequate clinic sample size is 50. The Bonferroni adjustment to multiple comparisons is not appropriate, Dunnett-Hsu is the one to use when you are only comparing all other treatments to a control. 60 per clinic would give you a safety margin for unforseen problems.

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Sun, 11 Dec 2016 04:31:06 GMT

Hi Damien,
Sorry, your last comment make me confused.

1. Could you clearly describe what 0.13, 0.10 and hold=1,2 mean separately in the command below? What 0.13? 0.10? hold=1,2? I hope it will be clear for any value given for the next time if you describe for what these values stand for. Where do you get these values?
parms (0.13)(0.10)/hold=1,2;

2. Is Bonferroni correction work if I randomly sampled three clinics with the three treatments per district/block?

3. The main reason I was planning to have one treatment clinic per district is information contamination; however, if it is not statistically feasible, I will randomly allocate the three treatments in three clinics which will are too far apart.

4. Should the effect size and variance be similar for the three treatments? Is it possible to use the least effect size and variance for all treatments in simulation?

5. Sorry, for the last time, what will be the sample size, precision and power if we assume 12 clusters per treatment (36 total clusters), 95%, average cluster size = 60, P1=30, P2=40, P3=50, effect size = 0.10, ICC=0.04, coefficient of variation = 0.5 and the variance = 0.4? These are under the assumption of three clinics/treatment unit per district/block and two clinics/treatment units per district/block. What will the simulation look like?

6. For which case = three clinics/treatment unit per district/block or two clinics/treatment units per district/block, does the Bonferroni correction work?

Kind regards

Teketo

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Damien_Mather — Sun, 11 Dec 2016 05:59:12 GMT

1. the 0.13 and the 0.10 are the split of your desired total variance of 0.23 between the district source of variation and the treatment source of variation. 0.13+0.10=0.23. You have to decide how to split your total variance between those two sources. Again I refer to the original paper and the excellent method that Stroup demonstrates. It is fundamental to the success of your experiment that you realise that there are two sources of variation, only one of which is the treatment factor.

/hold=1,2 tells proc glimmix to hold these two covarience values steady while estimating everything else. If you don't supply proc glimmix with some reasonable guesses for these covariances it cannot do the job of finding a sample size for a given precision.

2. I inferred from your earlier posts that the Bonferroni correction for multiple comparisons is not appropriate because you seem only interested in the differences between the control and the other 2 treatments, in which case the Dunnett adjustment is appropriate. Are you now saying that you are interested in the difference between 2 and 3 as well?

3. Great.

4. I don't know, you would be better placed than I to decide. I simply used the previous estimates you gave for the 3 treatments, which I see you have now changed again. Does this mean you are uncertain about the effectiveness of these treatments? Surely that is an issue of ethical concern for the clinics' patients who will be recuited for the trial. Just make them as realistic as you can. If you think p1=0.30, p2=0.40 and p3=0.50 is the most realistic, or at least as realistic as your last two suggestions, fine, run with that. The smallest difference will be the focus of the decision for sample size for precision, but that should not nudge you towards simulating equal treatment differences if you think another scenario is more likely. What do you really think?

5. (a) Apologies for not stating this before, but ICC and coefficient of variation are outside the scope of this type of simulation. However I doubt that assumptions about those would change the decision about sample size for precision much anyway. They will arise when you analyse the data at the individual patient level. (b) Are you seriously asking me to run those two simulations for you? If you don't understand enough about this method by now, maybe you should study the posts and original paper a bit more carefully and try and get some simulations running yourself. (c) Also, the second request involves deciding on a design that allocates 3 treatments over blocks of two experimental units. Do you have a particular balanced incomplete block design (or one that is nearly so?) in mind that you would like to examine for precision and efficiency? If so, what is it?

is it

blk/eu1/eu2

1 1 2

2 1 3

3 2 1

4 2 3

5 3 1

6 3 2

7 1 2

8 1 3

9 2 1

10 2 3

11 3 1

12 3 2

6. No, unless you are interested in all the differences in effect sizes not just those involving the control, as I suggested in 2 above.

Re: Sample size calculation for multiple groups and a cluster randomized controlled trail

Teketo — Sun, 11 Dec 2016 06:22:06 GMT

Hi Damien,

Now I got a clear a clear description; thank you indeed.

Yes, I will make comparison between the control and the two treatments and between 2 and 3. Treatment three is treatment two plus some new treatment; and thus I wan to make comparison what the addition one new treatment plus treatment 2 brings. I am wondering how the Bonferroni correction works here.

Yes, I was wondering if you do the simulation which will enable me to do my actual simulation based on the two scenarios.
Thank you indeed for your unreserved support!!!
kind regards

Teketo