turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Sample size calculation for multiple groups and a ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 08:01 PM

Dears at SAS,

I was trying calculate sample size for a cluster randomized control trial which has two different intervention groups and one control group (totally three groups). Is there a different assumption in sample size calculation for multiple groups other than two population proportion or mean? Is Bonferroni correction the best assumption or simply shall I use the two population and distribute it for the three groups?

I was using a formula for cluster randomized controlled trail with unequal cluster size, however, I faced difficulties in getting ICC (rho) and Coefficient of variation (CV). I didn't get a paper citing ICC and coefficient of variation and even I couldn’t get figures which enable me to calculate these constants. I was trying to calculate the average cluster size using a fixed cluster number but when I did feasibility check, the assumption was not satisfied. Do you have some advice or recommendation?

Even different people say different; the published papers even didn’t have a uniform consensus. Some paper says as I should do a simulation to have a sample size with a good power other say different. I do have three outcome variables with count and binary outcome.

Could you support me how I do simulation to determine the required sample size for a cluster randomized controlled trial which has three groups? What steps should I follow on the SAS software to calculate or simulate sample size? Which program, under the installed application there are lots of options like SAS Enterprise Guide, SAS ILM Studio, etc, should I use?

With kind regards

Teketo Kassaw

Accepted Solutions

Solution

12-14-2016
05:44 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-12-2016 04:15 AM

glimmix, not mixed.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 09:19 PM

see the paper suggested in this other thread:

https://communities.sas.com/t5/SAS-Statistical-Procedures/Sample-size-calculation-for-proportion-rep...

https://communities.sas.com/t5/SAS-Statistical-Procedures/Sample-size-calculation-for-proportion-rep...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 09:56 PM

Hi,

I read it and the paper too but I didn't get information which is related related to my question. Could you give me a detailed description or a paper that could help me?

TeketoRegards

I read it and the paper too but I didn't get information which is related related to my question. Could you give me a detailed description or a paper that could help me?

TeketoRegards

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 10:08 PM

Hi Damien,

Thank you. I read the mentioned paper and the responses. But I didn't get information which is directly related to my question. My question is about Sample size determination for a cluster randomized controlled trail which had three groups using SAS.

I have two types of outcomes; binary and count. The sample size I want to determine should take into consideration the following issues;

1. cluster number

2. cluster size

3. coefficient of variation

4. intracluster correlation coefficient / rho and

5. effect size

in addition to individual randomized controlled trila.

How can I determine the sample size for three groups; is the Bonferroni correction appropraite for it or is it possible two use the two population formula and then allocate for the three groups or is there any correction assumption other than this that SAS will consider?

Regards

Teketo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-10-2016 09:03 PM

```
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 13% or 0.13 diff) can be confidently detected at a given 95% level. **;
** I found this treatment effect size difference of 0.13 could just be detected at a 95 % C.L with a **;
** overall sample size of 545 by changing these probabilites and eu size and re-running the exemplar analysis **;
p1=.31;p2=.44;p3=.50;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 10 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 10 - 20, mean=15. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the 15 to something else. To change the size variation**;
** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally **;
** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
** nuisance **;
n=10+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit**;
mu=round(n*p,1);
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
** 3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts). It is vitally important to vary treatments within block **;
** other designs that do not include this principle fail to have any useful precison **;
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt / ddfm=contain;
random intercept trt / subject=block;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below **;
parms (0.13)(0.10)/hold=1,2;
** this tests for strong evidence of a difference at the precision and sample size simulated **;
lsmeans trt/diff cl;
run;
```

Number of Observations Read | 36 |
---|---|

Number of Observations Used | 36 |

Number of Events | 230 |

Number of Trials | 545 |

Covariance Parameter Estimates | |||
---|---|---|---|

Cov Parm | Subject | Estimate | Standard Error |

Intercept | block | 0.1300 | . |

trt | block | 0.1000 | . |

Differences of trt Least Squares Means | |||||||||
---|---|---|---|---|---|---|---|---|---|

trt | _trt | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper |

1 | 2 | -0.5722 | 0.2552 | 22 | -2.24 | 0.0353 | 0.05 | -1.1015 | -0.04299 |

1 | 3 | -0.8914 | 0.2559 | 22 | -3.48 | 0.0021 | 0.05 | -1.4221 | -0.3607 |

2 | 3 | -0.3192 | 0.2479 | 22 | -1.29 | 0.2113 | 0.05 | -0.8332 | 0.1949 |

In this simulation the covariances sum to 0.23 as desired and the control-experiment effect difference precision is 0.13. An average clinic sample size of 15 is sufficent with 3 treatments per district.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 10:15 PM

For other statistical model which proc power is unable to support , you can use simulate data to get it.

**http://blogs.sas.com/content/iml/2013/05/30/simulation-power.html**

**http://blogs.sas.com/content/iml/2013/06/05/simulation-power-curve.html**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 10:31 PM

I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.RegardsTeketo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2016 10:34 PM

Hi Ksharp,

Tahnk you. I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.

Regards

Teketo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-09-2016 09:27 PM

It might help community members help you better if you clarified your design some more. Currently, on the information given, I have these clarifying questions n my own mind:

Does your proposed design have just 3 treatments, one of which is a control, such as a currently used treatment, and the other two are new treatments of interest?

Does you proposed design have just 3 cluster samples chosen at random from a larger population, such as samples of patients from 3 primary health care centres chosen from a population of several hundred health care centres?

Do you propose that the 3 treatments, including the control, are randomly assigned to the 3 clusters?

Is it that simple?

If that is the case, I can't see how the variations in response amongst cluster means is not confounded with the variations in response amongst the treatment means, which is a design problem that was addressed by Yates and Fisher in agriculture about 100 years ago.

Surely you plan to allocate each treatment to more than one cluster sample group? As a bare minimum, should you not be considering allocating the 3 treatments to a further 3 cluster samples, making 6 groups in total?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-09-2016 10:41 PM

Hi Damien

Thank you for your kindly support.

I will have three groups:Control group - will receive a currently used treatment

Intervention group 1 - will receive new treatment 1

Intervention group 2 - will receive new treatment 2

With regard to the nature of clusters: It will be a two stage cluster.

Stage one = districts will be randomly selected

Stage two = primary health care facilities within the selected districts will be sampled

Thus, all clients, who will fulfill the inclusion criteria of the study, coming for the specified health care service at the sampled health facility will be included.

The number of clusters; that is, the districts and primary health care facilities will not be three. For sure, it will be more than 6. This is my question that how many clusters should I need to have sample size with good power.

Therefore, my question is

1. How many clusters; that is, the total number of primary health care facilities, should I need to achieve a good power? How many clients should be within each cluster/primary health care facility; that is average cluster size?

2. How can be sample size for a cluster randomized controlled trial be determined; taking ICC, coefficient of variation, effect size, a different cluster size - since the number of clients per each sampled health facility will not be the equal?

3. How can I determine the sample size for these three groups? Is the sample size determination different from two group? How does the Bonferroni correction work here? Is there any formula for multiple group sample size determination?

4. How SAS do it or any other software taking my questions raised above; cluster randomized controlled trial plus three group?

RegardsTeketo

Thank you for your kindly support.

I will have three groups:Control group - will receive a currently used treatment

Intervention group 1 - will receive new treatment 1

Intervention group 2 - will receive new treatment 2

With regard to the nature of clusters: It will be a two stage cluster.

Stage one = districts will be randomly selected

Stage two = primary health care facilities within the selected districts will be sampled

Thus, all clients, who will fulfill the inclusion criteria of the study, coming for the specified health care service at the sampled health facility will be included.

The number of clusters; that is, the districts and primary health care facilities will not be three. For sure, it will be more than 6. This is my question that how many clusters should I need to have sample size with good power.

Therefore, my question is

1. How many clusters; that is, the total number of primary health care facilities, should I need to achieve a good power? How many clients should be within each cluster/primary health care facility; that is average cluster size?

2. How can be sample size for a cluster randomized controlled trial be determined; taking ICC, coefficient of variation, effect size, a different cluster size - since the number of clients per each sampled health facility will not be the equal?

3. How can I determine the sample size for these three groups? Is the sample size determination different from two group? How does the Bonferroni correction work here? Is there any formula for multiple group sample size determination?

4. How SAS do it or any other software taking my questions raised above; cluster randomized controlled trial plus three group?

RegardsTeketo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-09-2016 11:31 PM

That's better, but you do realise, dont you, that you've just now *twice* contradicted your earlier statement about only being interested in study design precision and *not* power?

It seems like you are you saying that sample size control is not at all possible, once a primary health care facility has been selected? That can't be right, If that were the case, why would you asking about how to determine sample sizes?

I know from my own experience that is is near impossible to manage studies so that treatment and block group sizes come

out equal, but that should not impact on the experimental design stage, only the modelling stage. You should *strive* to obtain equal group sizes, and then do other things later on to deal with the group unbalance that you end up with, like eat ice cream (just joking) or use the proc glimmix model option ddfm=kr2 (not joking).

To be ethical for all stakeholders, frequent reporting on the current effective cluster sample sizes followed by timely advice to all primary healthcare participant recuiters when the quota is about to be met, so they can stop recruiting, would be best practice, right? Do y.ou plan to do this? This is not clear from your questions to date

Alternatively, do you have some idea of the different cluster sample sizes that will eventuate from the different primary health care cluster sample groups? Maybe an expected range of sizes?

If that is the case you can adapt the code example I gave to include individual sample groups drawn from, say, a uniform distribution over a range.

The code can easily be adaped to extend to more treatments than groups in a block, or more groups than treatments in each blocks, if that is what you are asking about.

Does any of this address your concerns? Do you need any more specific advice?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-10-2016 01:08 AM - edited 12-10-2016 09:19 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-10-2016 06:14 AM

Hi Damien,

Thank you indeed.

Could you givesome explanation or guide book about what the codes you used mean? For example,

n=35+round(10*ranuni(09051958),1);mu=n*p;

What does 09051958mean?

parms (.07)(0.04)/hold=1,2

What does 0.07, 0.04 andhold=1,2 mean?

Could you provide me some moredetail advise about;

1. How can be the sample size forequal treatment group be determined? The three treatment groups will have equalratio: 1:1:1. However, under each treatment I will have more than threeclusters, probably 12 clusters/primary health care facilities under each treatmentgroup which will give me a total of 36 clusters; that is, 36 primary healthcare facilities, having varying cluster size. I do not know, may be an averageof 50 or 60 samples/ participants per each cluster, which may give a total of36*50 = 1800 to 36*60 =2160 samples/participants; it is my assumption. I don'tknow whether this can be done on SAS or not.

2. Can the simulation give me theICC, coefficient of variation, effect size, the average cluster size and the likeused in the sample size calculation or that I should enter? For example,if I have nine clusters/primary health care facilities in treatment 1, does itgive me the average cluster size per the nine clusters/primary health carefacilities and for the rest of the treatment groups; that is, for current treatmentand treatment 2 as well?

3. I am not clear with what blockingmean; is it about cluster? What makes it different from the treatment group?Does the simulation you did considers the clustered nature of sampling? Forexample, what does “cluster sample sizes on 3 blocks of 3 treatment groups”mean? What does 3 block mean?

Regards

Teketo

Thank you indeed.

Could you givesome explanation or guide book about what the codes you used mean? For example,

n=35+round(10*ranuni(09051958),1);mu=n*p;

What does 09051958mean?

parms (.07)(0.04)/hold=1,2

What does 0.07, 0.04 andhold=1,2 mean?

Could you provide me some moredetail advise about;

1. How can be the sample size forequal treatment group be determined? The three treatment groups will have equalratio: 1:1:1. However, under each treatment I will have more than threeclusters, probably 12 clusters/primary health care facilities under each treatmentgroup which will give me a total of 36 clusters; that is, 36 primary healthcare facilities, having varying cluster size. I do not know, may be an averageof 50 or 60 samples/ participants per each cluster, which may give a total of36*50 = 1800 to 36*60 =2160 samples/participants; it is my assumption. I don'tknow whether this can be done on SAS or not.

2. Can the simulation give me theICC, coefficient of variation, effect size, the average cluster size and the likeused in the sample size calculation or that I should enter? For example,if I have nine clusters/primary health care facilities in treatment 1, does itgive me the average cluster size per the nine clusters/primary health carefacilities and for the rest of the treatment groups; that is, for current treatmentand treatment 2 as well?

3. I am not clear with what blockingmean; is it about cluster? What makes it different from the treatment group?Does the simulation you did considers the clustered nature of sampling? Forexample, what does “cluster sample sizes on 3 blocks of 3 treatment groups”mean? What does 3 block mean?

Regards

Teketo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-10-2016 07:43 AM

```
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 10% or 0.1 diff) can be confidently detected at a given level. Do you use 95% or 99%? **;
** I found this treatment effect size difference of 0.075 could just be detected at a 95 % C.L with a **;
** overall sample size of 1625 by changing these probabilites and eu size and re-running the exemplar analysis **;
p1=.2;p2=.275;p3=.35;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 40 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 40 - 50. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the 40 to something else. To change the size variation**;
** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number, or leave it blank for a new seed each time **;
n=40+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit**;
mu=n*p;
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (block) chosen at random each with **;
** 3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts in your case?) **;
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt ;
random intercept trt / subject=block;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below **;
parms (.08)(0.06)/hold=1,2;
** this tests for strong evidence of a difference at the precision and sample size simulated **;
lsmeans trt/diff cl;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-10-2016 04:22 PM

Hi Damien,

I really thank for your unreserved and kindly support. Let me ask some more detail thing;

1. Could there be any change on the following if I sampled one clinic per each sampled district? For example, if I randomly allocate only one treatment per district? How do the entire simulation look like?

cards;

1 1 2 3

2 1 3 2

3 2 1 3

4 2 3 1

5 3 1 2

6 3 2 1

7 1 2 3

8 1 3 2

9 2 1 3

10 2 3 1

11 3 1 2

12 3 2 1

2. How could it look like, lets say if I used 95% CI, effect size of 0.13, P1=0.31, p2=0.44, p3=0.50, ICC=0.03, coefficient of variation = 0.5 variance=0.23, and average cluster size 60? Is there a possibility to enter the above option? What will the entire precision look like? An what will be the total sample size too?

3.Is this number necessarily be eight digit? 09051958

4. How could I simulate the power for the above assumption in addition to the precision and sample size?

Kind regards

I really thank for your unreserved and kindly support. Let me ask some more detail thing;

1. Could there be any change on the following if I sampled one clinic per each sampled district? For example, if I randomly allocate only one treatment per district? How do the entire simulation look like?

cards;

1 1 2 3

2 1 3 2

3 2 1 3

4 2 3 1

5 3 1 2

6 3 2 1

7 1 2 3

8 1 3 2

9 2 1 3

10 2 3 1

11 3 1 2

12 3 2 1

2. How could it look like, lets say if I used 95% CI, effect size of 0.13, P1=0.31, p2=0.44, p3=0.50, ICC=0.03, coefficient of variation = 0.5 variance=0.23, and average cluster size 60? Is there a possibility to enter the above option? What will the entire precision look like? An what will be the total sample size too?

3.Is this number necessarily be eight digit? 09051958

4. How could I simulate the power for the above assumption in addition to the precision and sample size?

Kind regards