BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MJ1985
Fluorite | Level 6

Hi all, 

 

I am having trouble simulating the data for a powercurve. At this point I only now how to simulate data for power having a fixed number of boxes and a fixed number of samples in a box (nested design). I am pretty sure the solution is very easy, but how to change the code in order to let the simulation provide me the power for number of boxes varying from 4 to 8 and number of samples varying from 5 to 10?? I do not have acces to PROC IML so please keep it in the DATA STEP field. Much appreciated!!

 

%let ErrorVariance=3.0202;
%let BoxVariance=0.1511;
%let nsim=100;
%let Box=10;
%let Sample=10;
DATA SIM;
 call streaminit(123);
 do isim = 1 to ≁
 	do trt=1 to 6;
 		if trt=1 then mu=2.8994;
		if trt=2 then mu=2.6219; 
		if trt=3 then mu=2.3222; 
		if trt=4 then mu=2.7140; 
		if trt=5 then mu=2.2615; 
		if trt=6 then mu=3.0288; 
 			do Box=1 to &Box; 
			rndBox=rand("Normal",0,sqrt(&BoxVariance));
				do sample=1 to &Sample;
					rndSample=rand("Normal",0,sqrt(&ErrorVariance));
					y=mu+rndBox+rndSample;
output;
end;
end;
end;
end;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Realize that what you are doing is really an entire FAMILY of simulations that are parameterized by the number of boxes and the number of chicks.  When I program a simulation like this, I like to start with a single simulation, like this:

 


/* My design consists of 6 treatments. Each treatment has 4 boxes with 5 replicates. 
   Hence, the sample size is 6*4*5. */
DATA SIM1;
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);

boxMax = 4;
chickMax = 5;

/* do isim = 1 to ≁ */
   do trt=1 to 6;
      mu= t[trt];
      do Box = 1 to boxMax;
         rndBox=rand("Normal",0, sqrt(&BoxVariance));
         do Chicks = 1 to chickMax;
            rndChick=rand("Normal",0,sqrt(&ErrorVariance));
            y= mu + rndBox + rndChick;
            output;
         end;
      end;
   end;
/* end; */
run;

After I get that working, the parameterized family of simulations is accomplished by adding two outer loops over the parameters. In your case:

 


DATA SIM(keep=boxMax chickMax iSim Box Chicks y);
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
array numBoxes[3] _temporary_ (4, 6, 10);
array numChicks[4] _temporary_ (5, 7, 8, 10);

/* for outer loops, if the increment is 1 you can use 
   do boxMax = 4 to 10;  
      do chickMax = 5 to 10 */
do b = 1 to dim(numBoxes);
   boxMax = numBoxes[b];    
   do c = 1 to dim(numChicks);
      chickMax = numChicks[c];

      /* --- the following is the same as SIM1 --- */
      do isim = 1 to &NumSimulations;
         do trt=1 to 6;
            mu= t[trt];
            do Box = 1 to boxMax;
               rndBox=rand("Normal",0, sqrt(&BoxVariance));
               do Chicks = 1 to chickMax;
                  rndChick=rand("Normal",0,sqrt(&ErrorVariance));
                  y= mu + rndBox + rndChick;
                  output;
               end;
            end;
         end;
      end;
      /* --- the previous is the same as SIM1 --- */

   end;
end;
run;

In this program, I implemented the general case where the number of boxes and the number of chicks are not necessarily increasing by 1.  If your increment is 1, you can get rid of the arrays and simplify the outer loops, as indicated in the code comments.

View solution in original post

9 REPLIES 9
Rick_SAS
SAS Super FREQ

I discuss this in Chapter 11 of Simulating Data with SAS. The easiest way is to use an ARRAY instead of IF-THEN statements. Here is the unbalanced ANOVA simulation from the free code from my book (see the web site)

data Unbalanced(keep=Treatment Y);
call streaminit(1);
grandmean = 20;
array size{6}   _temporary_ (5 10 10 12 6 8);
array effect{6} _temporary_ (9 -6 -6 4 0 0);
array std{6}    _temporary_ (6  2  4 4 1 2);
do i = 1 to dim(effect);       /* number of treatment levels        */
   Treatment = i;
   do j = 1 to size{i};        /* number of obs per treatment level */
      Y = grandmean + effect{i} + rand("Normal", 0, std{i});
      output;
   end;
end;
run;


proc glm data=Unbalanced;
   class Treatment;
   model Y = Treatment;
quit;

. You can easily add the outer loop over the number of samples to generate multiple samples.

 

 

MJ1985
Fluorite | Level 6

Hi Rick, I already had the feeling that using arrays would prove the solution but for some reason I cannot get my head around the right coding to generate the ouput (I have your book and exactly your example is the one i have been trying to use ) 

 

%let ErrorVariance=3.0202;
%let BoxVariance=0.1511;
%let nsim=1;
%let Box=4;
%let Sample=5;
DATA SIM;
call streaminit(123);
array numBoxes{5} (4,5,6,7,8);
array numSamples{6} (5,6,7,8,9,10);
 		do isim = 1 to ≁
 			do trt=1 to 6;
 				if trt=1 then mu=2.8994; 
				if trt=2 then mu=2.6219; 
				if trt=3 then mu=2.3222; 
		if trt=4 then mu=2.7140;
		if trt=5 then mu=2.2615; 
		if trt=6 then mu=3.0288; 
 			do Box=1 to &Box; 
			rndBox=rand("Normal",0,sqrt(&BoxVariance)); 	/* create block specific deviates */
				do sample=1 to &Sample;
					rndSample=rand("Normal",0,sqrt(&ErrorVariance));
					y=mu+rndBox+rndSample;
output;
end;
end;
end;
end;

Theoretically, I would like the arrays to change the do loops, so parallel to the array values, a particular number of boxes and samples is created. However, I have no idea where to indicate this in the code as the do loops require a fixed range in itself and the nested structure already determines the sample size. 

KachiM
Rhodochrosite | Level 12

You may get the MU using another array as:

%let ErrorVariance=3.0202;
%let BoxVariance=0.1511;
%let nsim=1;
%let Box=4;
%let Sample=5;
DATA SIM;
call streaminit(123);
array numBoxes[5] _temporary_ (4,5,6,7,8);
array numSamples[6] _temporary_ (5,6,7,8,9,10);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
   do isim = 1 to ≁
      do trt=1 to 6;
         mu= t[trt]; 
         do Box=1 to &Box; 
            rndBox=rand("Normal",0,sqrt(&BoxVariance)); 	/* create block specific deviates */
            do sample=1 to &Sample;
               rndSample=rand("Normal",0,sqrt(&ErrorVariance));
               y=mu+rndBox+rndSample;
               output;
            end;
         end;
      end;
   end;
run;
Rick_SAS
SAS Super FREQ

By the way, in writing and speaking about simulations, I've found that it is easy to get confused about two related sizes:

sample size (N) versus the number of samples (NSim in your code; ), which is the number of Monte Carlo simulations.  Your program will be clearer to others if you do not use "sample" for the name of the iterator in the innermost loop. I use "SampleID" for the outermost iterator and some generic name such as "i" for the innermost loop that generates the observations. Just a thought, which you can use or ignore.

MJ1985
Fluorite | Level 6

I am sorry, but at this point i still do not see how i can tell SAS to run 100 simulations of a dataset with varying sample sizes;

 

So in one single code come up with 100 datasets for 4 boxes with 5 samples each, 4 boxes with 6 samples each, 5 boxes with 5 samples each, 5 boxes with 6 samples each etc...

 

it must probably be very simple, but unlike R, where you can tell the programm to create 100 random variables in a single line of code (and thus vary by modifying that line of code), SAS requires a do loop in the datastep to create a nested design.... so i do not see how to create in a single code several datasets of different sample sizes each.... 

 

or perhaps i need to modify the way I create the nested design..?

Rick_SAS
SAS Super FREQ

As I discuss on p. 200 ("Changing the Order of Loops"), you are free to change the order of loops if that makes the simulation easier. I provide an example. The discussion on p. 199 ("How you generate the x variable depends on what you are trying to do") also seems relevant.

 

I cannot recommend any code because I do not understand the design of the simulation study.  I do not know what a "box" is and you seem to be using "samples" in a way that I do not understand. To me, a "sample" is what you seem to be calling a "dataset."

 

If this is a power analysis that is studying the effect of sample size, see Chapter 5 (p. 87), which varies the sample size as part of a power analysis for the t test.

 

Do you want 100 samples for each distinct combination of (Box x SampleSize)?  If so, that is the case, then put the loops for "box" and "SampleSize" on the outside. Put the loop to generate each sample inside that.  

 

 

MJ1985
Fluorite | Level 6

Hi Rick, 

 

I understand and i have actually been looking at Chapter 5 and 11 a lot but I guess I am stuck in a certain mindset.

 

My design consists of 6 treatments. Each treatment has 4 boxes with 5 replicates. Hence, the sample size is 6*4*5. Now, I would like to simulate varying sizes of boxes and replicates. Hence, one simulation is one sample in my book, and I would like a 1000 samples (simulations) for each design (Boxes*Replicates)

 

at this point I have this, and it seems to produce varying estimates, but i need to recheck. 

 

%let ChickVariance=3.0202;
%let BoxVariance=0.1511;
%let NumBoxes=10;
%let NumChicks=10;
%let NumSimulations=50;
data Sim;
call streaminit(321);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
do trt=1 to 6;
mu= t[trt];
do NumBoxes = 4 to &NumBoxes by 1; /* sample size */
do NumChicks = 5 to &NumChicks by 1;
do Simulation = 1 to 5; /* this way box is just a number and does not determine anything further */
do Box=1 to numBoxes;
rndBox=rand("Normal", 0 , &BoxVariance);
do Chick=1 to NumChicks;
rndChick=rand("Normal", 0, &ChickVariance);
y=rndBox+rndChick;

output;
end;
end;
end;
end;
end;
end;
run;

 

 

 

 

Rick_SAS
SAS Super FREQ

Realize that what you are doing is really an entire FAMILY of simulations that are parameterized by the number of boxes and the number of chicks.  When I program a simulation like this, I like to start with a single simulation, like this:

 


/* My design consists of 6 treatments. Each treatment has 4 boxes with 5 replicates. 
   Hence, the sample size is 6*4*5. */
DATA SIM1;
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);

boxMax = 4;
chickMax = 5;

/* do isim = 1 to ≁ */
   do trt=1 to 6;
      mu= t[trt];
      do Box = 1 to boxMax;
         rndBox=rand("Normal",0, sqrt(&BoxVariance));
         do Chicks = 1 to chickMax;
            rndChick=rand("Normal",0,sqrt(&ErrorVariance));
            y= mu + rndBox + rndChick;
            output;
         end;
      end;
   end;
/* end; */
run;

After I get that working, the parameterized family of simulations is accomplished by adding two outer loops over the parameters. In your case:

 


DATA SIM(keep=boxMax chickMax iSim Box Chicks y);
call streaminit(123);
array t[6] _temporary_ (2.8994,2.6219,2.3222,2.7140,2.2615,3.0288);
array numBoxes[3] _temporary_ (4, 6, 10);
array numChicks[4] _temporary_ (5, 7, 8, 10);

/* for outer loops, if the increment is 1 you can use 
   do boxMax = 4 to 10;  
      do chickMax = 5 to 10 */
do b = 1 to dim(numBoxes);
   boxMax = numBoxes[b];    
   do c = 1 to dim(numChicks);
      chickMax = numChicks[c];

      /* --- the following is the same as SIM1 --- */
      do isim = 1 to &NumSimulations;
         do trt=1 to 6;
            mu= t[trt];
            do Box = 1 to boxMax;
               rndBox=rand("Normal",0, sqrt(&BoxVariance));
               do Chicks = 1 to chickMax;
                  rndChick=rand("Normal",0,sqrt(&ErrorVariance));
                  y= mu + rndBox + rndChick;
                  output;
               end;
            end;
         end;
      end;
      /* --- the previous is the same as SIM1 --- */

   end;
end;
run;

In this program, I implemented the general case where the number of boxes and the number of chicks are not necessarily increasing by 1.  If your increment is 1, you can get rid of the arrays and simplify the outer loops, as indicated in the code comments.

MJ1985
Fluorite | Level 6

Thank you Rick!! Of course, you just beat me to the solution, as I think I posted a quite similar code in terms of setup. Still great to see additional modifications that I can use liek increments. Examples such as this show that reading a book is great, but you really learn once you have a live example you need to master!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3034 views
  • 2 likes
  • 3 in conversation