Hi all,
I am having trouble simulating the data for a powercurve. At this point I only now how to simulate data for power having a fixed number of boxes and a fixed number of samples in a box (nested design). I am pretty sure the solution is very easy, but how to change the code in order to let the simulation provide me the power for number of boxes varying from 4 to 8 and number of samples varying from 5 to 10?? I do not have acces to PROC IML so please keep it in the DATA STEP field. Much appreciated!!
%let ErrorVariance=3.0202; %let BoxVariance=0.1511; %let nsim=100; %let Box=10; %let Sample=10; DATA SIM; call streaminit(123); do isim = 1 to ≁ do trt=1 to 6; if trt=1 then mu=2.8994; if trt=2 then mu=2.6219; if trt=3 then mu=2.3222; if trt=4 then mu=2.7140; if trt=5 then mu=2.2615; if trt=6 then mu=3.0288; do Box=1 to &Box; rndBox=rand("Normal",0,sqrt(&BoxVariance)); do sample=1 to &Sample; rndSample=rand("Normal",0,sqrt(&ErrorVariance)); y=mu+rndBox+rndSample; output; end; end; end; end;
Realize that what you are doing is really an entire FAMILY of simulations that are parameterized by the number of boxes and the number of chicks. When I program a simulation like this, I like to start with a single simulation, like this:
/* My design consists of 6 treatments. Each treatment has 4 boxes with 5 replicates.
Hence, the sample size is 6*4*5. */
DATA SIM1;
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
boxMax = 4;
chickMax = 5;
/* do isim = 1 to ≁ */
do trt=1 to 6;
mu= t[trt];
do Box = 1 to boxMax;
rndBox=rand("Normal",0, sqrt(&BoxVariance));
do Chicks = 1 to chickMax;
rndChick=rand("Normal",0,sqrt(&ErrorVariance));
y= mu + rndBox + rndChick;
output;
end;
end;
end;
/* end; */
run;
After I get that working, the parameterized family of simulations is accomplished by adding two outer loops over the parameters. In your case:
DATA SIM(keep=boxMax chickMax iSim Box Chicks y);
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
array numBoxes[3] _temporary_ (4, 6, 10);
array numChicks[4] _temporary_ (5, 7, 8, 10);
/* for outer loops, if the increment is 1 you can use
do boxMax = 4 to 10;
do chickMax = 5 to 10 */
do b = 1 to dim(numBoxes);
boxMax = numBoxes[b];
do c = 1 to dim(numChicks);
chickMax = numChicks[c];
/* --- the following is the same as SIM1 --- */
do isim = 1 to &NumSimulations;
do trt=1 to 6;
mu= t[trt];
do Box = 1 to boxMax;
rndBox=rand("Normal",0, sqrt(&BoxVariance));
do Chicks = 1 to chickMax;
rndChick=rand("Normal",0,sqrt(&ErrorVariance));
y= mu + rndBox + rndChick;
output;
end;
end;
end;
end;
/* --- the previous is the same as SIM1 --- */
end;
end;
run;
In this program, I implemented the general case where the number of boxes and the number of chicks are not necessarily increasing by 1. If your increment is 1, you can get rid of the arrays and simplify the outer loops, as indicated in the code comments.
I discuss this in Chapter 11 of Simulating Data with SAS. The easiest way is to use an ARRAY instead of IF-THEN statements. Here is the unbalanced ANOVA simulation from the free code from my book (see the web site)
data Unbalanced(keep=Treatment Y);
call streaminit(1);
grandmean = 20;
array size{6} _temporary_ (5 10 10 12 6 8);
array effect{6} _temporary_ (9 -6 -6 4 0 0);
array std{6} _temporary_ (6 2 4 4 1 2);
do i = 1 to dim(effect); /* number of treatment levels */
Treatment = i;
do j = 1 to size{i}; /* number of obs per treatment level */
Y = grandmean + effect{i} + rand("Normal", 0, std{i});
output;
end;
end;
run;
proc glm data=Unbalanced;
class Treatment;
model Y = Treatment;
quit;
. You can easily add the outer loop over the number of samples to generate multiple samples.
Hi Rick, I already had the feeling that using arrays would prove the solution but for some reason I cannot get my head around the right coding to generate the ouput (I have your book and exactly your example is the one i have been trying to use )
%let ErrorVariance=3.0202; %let BoxVariance=0.1511; %let nsim=1; %let Box=4; %let Sample=5; DATA SIM; call streaminit(123); array numBoxes{5} (4,5,6,7,8); array numSamples{6} (5,6,7,8,9,10); do isim = 1 to ≁ do trt=1 to 6; if trt=1 then mu=2.8994; if trt=2 then mu=2.6219; if trt=3 then mu=2.3222; if trt=4 then mu=2.7140; if trt=5 then mu=2.2615; if trt=6 then mu=3.0288; do Box=1 to &Box; rndBox=rand("Normal",0,sqrt(&BoxVariance)); /* create block specific deviates */ do sample=1 to &Sample; rndSample=rand("Normal",0,sqrt(&ErrorVariance)); y=mu+rndBox+rndSample; output; end; end; end; end;
Theoretically, I would like the arrays to change the do loops, so parallel to the array values, a particular number of boxes and samples is created. However, I have no idea where to indicate this in the code as the do loops require a fixed range in itself and the nested structure already determines the sample size.
You may get the MU using another array as:
%let ErrorVariance=3.0202;
%let BoxVariance=0.1511;
%let nsim=1;
%let Box=4;
%let Sample=5;
DATA SIM;
call streaminit(123);
array numBoxes[5] _temporary_ (4,5,6,7,8);
array numSamples[6] _temporary_ (5,6,7,8,9,10);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
do isim = 1 to ≁
do trt=1 to 6;
mu= t[trt];
do Box=1 to &Box;
rndBox=rand("Normal",0,sqrt(&BoxVariance)); /* create block specific deviates */
do sample=1 to &Sample;
rndSample=rand("Normal",0,sqrt(&ErrorVariance));
y=mu+rndBox+rndSample;
output;
end;
end;
end;
end;
run;
By the way, in writing and speaking about simulations, I've found that it is easy to get confused about two related sizes:
sample size (N) versus the number of samples (NSim in your code; ), which is the number of Monte Carlo simulations. Your program will be clearer to others if you do not use "sample" for the name of the iterator in the innermost loop. I use "SampleID" for the outermost iterator and some generic name such as "i" for the innermost loop that generates the observations. Just a thought, which you can use or ignore.
I am sorry, but at this point i still do not see how i can tell SAS to run 100 simulations of a dataset with varying sample sizes;
So in one single code come up with 100 datasets for 4 boxes with 5 samples each, 4 boxes with 6 samples each, 5 boxes with 5 samples each, 5 boxes with 6 samples each etc...
it must probably be very simple, but unlike R, where you can tell the programm to create 100 random variables in a single line of code (and thus vary by modifying that line of code), SAS requires a do loop in the datastep to create a nested design.... so i do not see how to create in a single code several datasets of different sample sizes each....
or perhaps i need to modify the way I create the nested design..?
As I discuss on p. 200 ("Changing the Order of Loops"), you are free to change the order of loops if that makes the simulation easier. I provide an example. The discussion on p. 199 ("How you generate the x variable depends on what you are trying to do") also seems relevant.
I cannot recommend any code because I do not understand the design of the simulation study. I do not know what a "box" is and you seem to be using "samples" in a way that I do not understand. To me, a "sample" is what you seem to be calling a "dataset."
If this is a power analysis that is studying the effect of sample size, see Chapter 5 (p. 87), which varies the sample size as part of a power analysis for the t test.
Do you want 100 samples for each distinct combination of (Box x SampleSize)? If so, that is the case, then put the loops for "box" and "SampleSize" on the outside. Put the loop to generate each sample inside that.
Hi Rick,
I understand and i have actually been looking at Chapter 5 and 11 a lot but I guess I am stuck in a certain mindset.
My design consists of 6 treatments. Each treatment has 4 boxes with 5 replicates. Hence, the sample size is 6*4*5. Now, I would like to simulate varying sizes of boxes and replicates. Hence, one simulation is one sample in my book, and I would like a 1000 samples (simulations) for each design (Boxes*Replicates)
at this point I have this, and it seems to produce varying estimates, but i need to recheck.
%let ChickVariance=3.0202;
%let BoxVariance=0.1511;
%let NumBoxes=10;
%let NumChicks=10;
%let NumSimulations=50;
data Sim;
call streaminit(321);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
do trt=1 to 6;
mu= t[trt];
do NumBoxes = 4 to &NumBoxes by 1; /* sample size */
do NumChicks = 5 to &NumChicks by 1;
do Simulation = 1 to 5; /* this way box is just a number and does not determine anything further */
do Box=1 to numBoxes;
rndBox=rand("Normal", 0 , &BoxVariance);
do Chick=1 to NumChicks;
rndChick=rand("Normal", 0, &ChickVariance);
y=rndBox+rndChick;
output;
end;
end;
end;
end;
end;
end;
run;
Realize that what you are doing is really an entire FAMILY of simulations that are parameterized by the number of boxes and the number of chicks. When I program a simulation like this, I like to start with a single simulation, like this:
/* My design consists of 6 treatments. Each treatment has 4 boxes with 5 replicates.
Hence, the sample size is 6*4*5. */
DATA SIM1;
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
boxMax = 4;
chickMax = 5;
/* do isim = 1 to ≁ */
do trt=1 to 6;
mu= t[trt];
do Box = 1 to boxMax;
rndBox=rand("Normal",0, sqrt(&BoxVariance));
do Chicks = 1 to chickMax;
rndChick=rand("Normal",0,sqrt(&ErrorVariance));
y= mu + rndBox + rndChick;
output;
end;
end;
end;
/* end; */
run;
After I get that working, the parameterized family of simulations is accomplished by adding two outer loops over the parameters. In your case:
DATA SIM(keep=boxMax chickMax iSim Box Chicks y);
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
array numBoxes[3] _temporary_ (4, 6, 10);
array numChicks[4] _temporary_ (5, 7, 8, 10);
/* for outer loops, if the increment is 1 you can use
do boxMax = 4 to 10;
do chickMax = 5 to 10 */
do b = 1 to dim(numBoxes);
boxMax = numBoxes[b];
do c = 1 to dim(numChicks);
chickMax = numChicks[c];
/* --- the following is the same as SIM1 --- */
do isim = 1 to &NumSimulations;
do trt=1 to 6;
mu= t[trt];
do Box = 1 to boxMax;
rndBox=rand("Normal",0, sqrt(&BoxVariance));
do Chicks = 1 to chickMax;
rndChick=rand("Normal",0,sqrt(&ErrorVariance));
y= mu + rndBox + rndChick;
output;
end;
end;
end;
end;
/* --- the previous is the same as SIM1 --- */
end;
end;
run;
In this program, I implemented the general case where the number of boxes and the number of chicks are not necessarily increasing by 1. If your increment is 1, you can get rid of the arrays and simplify the outer loops, as indicated in the code comments.
Thank you Rick!! Of course, you just beat me to the solution, as I think I posted a quite similar code in terms of setup. Still great to see additional modifications that I can use liek increments. Examples such as this show that reading a book is great, but you really learn once you have a live example you need to master!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.