What is the best way to randomize existing participants at different sites in a 1:1 ratio to 2 intervention groups (group T or P) under the following conditions:
1. Enrollment is ongoing so I’d like to account for an extra 10 participants at each site
2. The sites have different sample sizes, including odd numbered sample sizes (some sites have only 1 participant so far, a few sites have no participants yet, others have 7, 68, 4 participants etc.)
3. The group assignment has to be 1:1 ratio but not sure how that’d be even possible for sites that have an odd number of participants?
Ideally, I would like to do something like a biased coin/adaptive kind of randomization. For example, if site has 3 participants enrolled (2 to group T, 1 to group P) and then a 4th participant later enrolls, I'd like this 4th one to get assigned to group P, since this is underrepresented group. And so on for any additional enrolled participants.
Here is what I did so far and a sample of my code: I randomized the existing participants first (rand1 set), then I created a dataset of 10 additional participants, randomized this list (rand2) and finally, I combined rand1 and rand2. But I don't think this is any good since it's not really doing the adaptive thing which I don't know how to do.
*Subset participants at each site;
data site1;
set four;
where site="S1";
unit=_n_;
run;
proc print data=site1; run; *3 participants;
proc sort data=site1; by unit; run;
*Randomize existing participants at site by using random numbers from uniform distribution;
data rand1;
set site1;
ran=int(ranuni(1234)*1000);
run;
proc sort data=rand1; by ran; RUN;
proc print data=rand1; run;
proc format;
value trtfmt 1 = 'P'
2 = 'T’;
run;
*Get current sample size/number of participants per site;
proc sql; select max(unit) into:maxnr trimmed from site1; quit;
%put &maxnr;
*Assign random treatment sequence;
data rand1;
set rand1;
*for even sample size, allocate equal numbers of participants to each of the 2 treatment groups;
if mod(&maxnr, 2)=0 then do;
if _N_ le (&maxnr)/2 then trt=1;
else trt=2;
end;
*for odd sample size, two groups are in balance if sample size of one group is within 1 of the other group;
else if mod(&maxnr, 2) ne 0 then do;
x=round((&maxnr)/2);
y=x-1;
if _N_ le y then group=1;
else group=2;
end;
format trt trtfmt.;
run;
proc sort data=rand1; by unit; run;
proc print data=rand1; run;
*Want to allocate an additional 10 participants per site
*Unrandom set of 10 additional participants at site;
data unrand;
if &maxnr =. then do;
do Unit = 1 to 10;
output;
end;
end;
else if &maxnr ne . then do;
do Unit = (&maxnr+1) to (&maxnr+10);
output;
end;
end;
run;
proc print data=unrand; run;
*Randomize the list;
data rand2;
set unrand;
ran=int(ranuni(9822)*1000);
run;
proc sort data=rand2; by ran; run;
proc print data=rand2; run;
*Assign 5 participants to one group, and the remaining 5 to the other group;
data rand2;
set rand2;
if _N_ le 5 then trt=1;
else trt=2;
format trt trtfmt.;
run;
proc sort data=rand2; by unit; run;
proc print data=rand2; run;
*Combine randomized list of existing participants with randomized list of additional 10 participants;
data rand_list;
set rand1 rand2;
if site="" then site="S1";
if patid="" then patid="S10XX";
keep site patid trt;
run;
You do not have to process each site separately. If you sort your input, it can be done with BY group processing as well.
Here is an example using SASHELP.CLASS as the input, and SEX as the BY group (instead of your SITE):
proc sort data=sashelp.class out=start;
where age<14;
by sex;
run;
data initial;
do count=1 by 1 until(last.sex);
set start;
by sex;
end;
N1=int(count/2);
if mod(count,2)=1 then
N1=N1+round(rand('uniform'));
N2=count-N1;
do until(last.sex);
set start;
by sex;
if N1>0 and rand('uniform')<=N1/(N1+N2) then do;
tp=1;
N1=N1-1;
end;
else do;
tp=2;
N2=N2-1;
end;
output;
end;
drop count N1 N2;
run;
What it does: the first DO loop simply counts the number of participants in each BY group.
After that, half is assigned to each type (TP variable), if the count is odd, the odd one out is randomly assigned to one of the groups.
So, before the next loop, the number that we want of each type is in the variables N1 and N2.
In the second loop, the participants in the BY group are randomly assigned to the two types. We keep counting down N1 and N2, so that the correct total number will be assigned to each group.
Now, when you get more data, we can simulate that by adding the rest of SASHELP.CLASS:
data all;
set initial sashelp.class(where=(age>=14));
run;
You could also use a PROC APPEND or a MERGE (by first sorting by the BY group (here: SEX) and the unique id (here: NAME).
Sort your new dataset:
proc sort data=all;
by sex;
run;
And then do just about the same thing as before:
data want;
count1=0;
count2=0;
do count=1 by 1 until(last.sex);
set all;
by sex;
select(tp);
when(1) count1+1;
when(2) count2+1;
otherwise;
end;
end;
N1=int(count/2);
if mod(count,2)=1 then
N1=N1+round(rand('uniform'));
N2=count-N1;
N1=N1-count1;
N2=N2-count2;
do until(last.sex);
set all;
by sex;
if missing(tp) then do;
if N1>0 and rand('uniform')<=N1/(N1+N2) then do;
tp=1;
N1=N1-1;
end;
else do;
tp=2;
N2=N2-1;
end;
end;
output;
end;
drop count: N1 N2;
run;
The difference this time is that we also count the number of participants already assigned, and subtract that from the calculated N1 and N2.
I did this using the RAND function, which is a better random number generator than RANUNI. If you want to be able to replicate your results (getting the same distribution in two separate runs) you should use CALL STREAMINIT in the beginning of each data step.
EDIT note: I added the condition N1>0 to the condition in the second loop, just in case the RAND function returns a 0 value (I am not quite sure if it can).
You do not have to process each site separately. If you sort your input, it can be done with BY group processing as well.
Here is an example using SASHELP.CLASS as the input, and SEX as the BY group (instead of your SITE):
proc sort data=sashelp.class out=start;
where age<14;
by sex;
run;
data initial;
do count=1 by 1 until(last.sex);
set start;
by sex;
end;
N1=int(count/2);
if mod(count,2)=1 then
N1=N1+round(rand('uniform'));
N2=count-N1;
do until(last.sex);
set start;
by sex;
if N1>0 and rand('uniform')<=N1/(N1+N2) then do;
tp=1;
N1=N1-1;
end;
else do;
tp=2;
N2=N2-1;
end;
output;
end;
drop count N1 N2;
run;
What it does: the first DO loop simply counts the number of participants in each BY group.
After that, half is assigned to each type (TP variable), if the count is odd, the odd one out is randomly assigned to one of the groups.
So, before the next loop, the number that we want of each type is in the variables N1 and N2.
In the second loop, the participants in the BY group are randomly assigned to the two types. We keep counting down N1 and N2, so that the correct total number will be assigned to each group.
Now, when you get more data, we can simulate that by adding the rest of SASHELP.CLASS:
data all;
set initial sashelp.class(where=(age>=14));
run;
You could also use a PROC APPEND or a MERGE (by first sorting by the BY group (here: SEX) and the unique id (here: NAME).
Sort your new dataset:
proc sort data=all;
by sex;
run;
And then do just about the same thing as before:
data want;
count1=0;
count2=0;
do count=1 by 1 until(last.sex);
set all;
by sex;
select(tp);
when(1) count1+1;
when(2) count2+1;
otherwise;
end;
end;
N1=int(count/2);
if mod(count,2)=1 then
N1=N1+round(rand('uniform'));
N2=count-N1;
N1=N1-count1;
N2=N2-count2;
do until(last.sex);
set all;
by sex;
if missing(tp) then do;
if N1>0 and rand('uniform')<=N1/(N1+N2) then do;
tp=1;
N1=N1-1;
end;
else do;
tp=2;
N2=N2-1;
end;
end;
output;
end;
drop count: N1 N2;
run;
The difference this time is that we also count the number of participants already assigned, and subtract that from the calculated N1 and N2.
I did this using the RAND function, which is a better random number generator than RANUNI. If you want to be able to replicate your results (getting the same distribution in two separate runs) you should use CALL STREAMINIT in the beginning of each data step.
EDIT note: I added the condition N1>0 to the condition in the second loop, just in case the RAND function returns a 0 value (I am not quite sure if it can).
Hi s_lassen,
Thank you so much for your helpful reply and code, it works perfectly.
I have one additional question, if you don't mind: based on the data I have (some sample dataset below), would you think it's possible to adapt the code below so that it could also perform block randomization for each site? I have found many examples of block randomization but they all seem to be for cases where each site has the same sample size and furthermore, that sample size is an even number. But I have not been able to figure out how or if it's even possible to do this for my situation where I have varying sample sizes for each site and some sites have an odd number of participants (largest sample size for one site is 68, another has 59 participants, another 30 etc).
data initial_data;
input site $ patid $;
datalines;
Site1 S1001
Site1 S1002
Site1 S1003
Site1 S1004
Site1 S1005
Site1 S1006
Site2 S2001
Site2 S2002
Site2 S2003
Site3 S3001
Site3 S3002
Site3 S3003
Site3 S3004
Site3 S3005
Site3 S3006
Site3 S3007
Site3 S3008
Site3 S3009
Site3 S3010
Site3 S3011
Site3 S3012
Site3 S3013
Site3 S3014
Site3 S3015
Site3 S3016
Site3 S3017
Site3 S3018
Site3 S3019
Site3 S3020
Site3 S3021
Site3 S3022
Site3 S3023
;
run;
*Sort data by site;
proc sort data=initial_data out=start;
by site;
run;
I am not quite sure what you mean by "block randomization" here. What you are doing is already a block randomization: instead of just assigning all the participants randomly to two groups, you assign them by SITE. If you mean that you also have another variable (could be sex/gender) that should be taken in consideration here, it would just be a matter of using two (or more) BY variables, instead of just one. If that is not the answer, please be more specific.
Oh I'm sorry, I should have been more specific. So, the code does stratified randomization by site but what I meant by block randomization is: within each site, can the participants be randomized within blocks such that an equal number are assigned to each group? The basic idea would be to divide participants into m blocks of size 2n, randomize each block such that n patients are allocated to group T and n to group P and then choose the blocks randomly.
Example: Two treatments of A, B and Block size = 4
Possible treatment allocations within each block are:
(1) AABB, (2) BBAA, (3) ABAB, (4) BABA, (5) ABBA, (6) BAAB
I was reading and found some code for how to do this (see below) but all the examples I've found assume that the sample sizes are the same at each site (in example below, 60) and the sample size is an even number. So not sure if it'd be possible to do something similar for my case with varying sample sizes (some odd numbered) per site
%let nsite=3;
%let N = 60;
%let blocksize = 4;
%let seed = 42;
data blocks;
call streaminit(&seed);
do site=1 to &nsite;
do block = 1 to ceil(&N/&blocksize);
do item = 1 to &blocksize;
if item le &blocksize/2 then arm="Arm 1";
else arm="Arm 2";
rand = rand('UNIFORM');
output;
end;
end;
end;
run;
proc sort data = blocks; by site block rand; run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.