Teketo Kassaw
glimmix, not mixed.
Hi Damien,
Thank you. I read the mentioned paper and the responses. But I didn't get information which is directly related to my question. My question is about Sample size determination for a cluster randomized controlled trail which had three groups using SAS.
I have two types of outcomes; binary and count. The sample size I want to determine should take into consideration the following issues;
1. cluster number
2. cluster size
3. coefficient of variation
4. intracluster correlation coefficient / rho and
5. effect size
in addition to individual randomized controlled trila.
How can I determine the sample size for three groups; is the Bonferroni correction appropraite for it or is it possible two use the two population formula and then allocate for the three groups or is there any correction assumption other than this that SAS will consider?
Regards
Teketo
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 13% or 0.13 diff) can be confidently detected at a given 95% level. **;
** I found this treatment effect size difference of 0.13 could just be detected at a 95 % C.L with a **;
** overall sample size of 545 by changing these probabilites and eu size and re-running the exemplar analysis **;
p1=.31;p2=.44;p3=.50;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 10 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 10 - 20, mean=15. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the 15 to something else. To change the size variation**;
** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally **;
** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
** nuisance **;
n=10+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit**;
mu=round(n*p,1);
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
** 3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts). It is vitally important to vary treatments within block **;
** other designs that do not include this principle fail to have any useful precison **;
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt / ddfm=contain;
random intercept trt / subject=block;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below **;
parms (0.13)(0.10)/hold=1,2;
** this tests for strong evidence of a difference at the precision and sample size simulated **;
lsmeans trt/diff cl;
run;
Number of Observations Read | 36 |
---|---|
Number of Observations Used | 36 |
Number of Events | 230 |
Number of Trials | 545 |
Covariance Parameter Estimates | |||
---|---|---|---|
Cov Parm | Subject | Estimate | Standard Error |
Intercept | block | 0.1300 | . |
trt | block | 0.1000 | . |
Differences of trt Least Squares Means | |||||||||
---|---|---|---|---|---|---|---|---|---|
trt | _trt | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper |
1 | 2 | -0.5722 | 0.2552 | 22 | -2.24 | 0.0353 | 0.05 | -1.1015 | -0.04299 |
1 | 3 | -0.8914 | 0.2559 | 22 | -3.48 | 0.0021 | 0.05 | -1.4221 | -0.3607 |
2 | 3 | -0.3192 | 0.2479 | 22 | -1.29 | 0.2113 | 0.05 | -0.8332 | 0.1949 |
In this simulation the covariances sum to 0.23 as desired and the control-experiment effect difference precision is 0.13. An average clinic sample size of 15 is sufficent with 3 treatments per district.
For other statistical model which proc power is unable to support , you can use simulate data to get it.
http://blogs.sas.com/content/iml/2013/05/30/simulation-power.html
http://blogs.sas.com/content/iml/2013/06/05/simulation-power-curve.html
Hi Ksharp,
Tahnk you. I read it but it is all about power calculation after setting a sample size. I think you didn't get my concern; I am concerned about sample size determination not power simulation. How can I calculate sample size for a cluster randomized controlled trial having three groups? I am not concerned about power at the time being.
Regards
Teketo
It might help community members help you better if you clarified your design some more. Currently, on the information given, I have these clarifying questions n my own mind:
Does your proposed design have just 3 treatments, one of which is a control, such as a currently used treatment, and the other two are new treatments of interest?
Does you proposed design have just 3 cluster samples chosen at random from a larger population, such as samples of patients from 3 primary health care centres chosen from a population of several hundred health care centres?
Do you propose that the 3 treatments, including the control, are randomly assigned to the 3 clusters?
Is it that simple?
If that is the case, I can't see how the variations in response amongst cluster means is not confounded with the variations in response amongst the treatment means, which is a design problem that was addressed by Yates and Fisher in agriculture about 100 years ago.
Surely you plan to allocate each treatment to more than one cluster sample group? As a bare minimum, should you not be considering allocating the 3 treatments to a further 3 cluster samples, making 6 groups in total?
That's better, but you do realise, dont you, that you've just now twice contradicted your earlier statement about only being interested in study design precision and not power?
It seems like you are you saying that sample size control is not at all possible, once a primary health care facility has been selected? That can't be right, If that were the case, why would you asking about how to determine sample sizes?
I know from my own experience that is is near impossible to manage studies so that treatment and block group sizes come
out equal, but that should not impact on the experimental design stage, only the modelling stage. You should strive to obtain equal group sizes, and then do other things later on to deal with the group unbalance that you end up with, like eat ice cream (just joking) or use the proc glimmix model option ddfm=kr2 (not joking).
To be ethical for all stakeholders, frequent reporting on the current effective cluster sample sizes followed by timely advice to all primary healthcare participant recuiters when the quota is about to be met, so they can stop recruiting, would be best practice, right? Do y.ou plan to do this? This is not clear from your questions to date
Alternatively, do you have some idea of the different cluster sample sizes that will eventuate from the different primary health care cluster sample groups? Maybe an expected range of sizes?
If that is the case you can adapt the code example I gave to include individual sample groups drawn from, say, a uniform distribution over a range.
The code can easily be adaped to extend to more treatments than groups in a block, or more groups than treatments in each blocks, if that is what you are asking about.
Does any of this address your concerns? Do you need any more specific advice?
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 10% or 0.1 diff) can be confidently detected at a given level. Do you use 95% or 99%? **;
** I found this treatment effect size difference of 0.075 could just be detected at a 95 % C.L with a **;
** overall sample size of 1625 by changing these probabilites and eu size and re-running the exemplar analysis **;
p1=.2;p2=.275;p3=.35;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 40 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 40 - 50. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the 40 to something else. To change the size variation**;
** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number, or leave it blank for a new seed each time **;
n=40+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit**;
mu=n*p;
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (block) chosen at random each with **;
** 3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts in your case?) **;
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt ;
random intercept trt / subject=block;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below **;
parms (.08)(0.06)/hold=1,2;
** this tests for strong evidence of a difference at the precision and sample size simulated **;
lsmeans trt/diff cl;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.