BookmarkSubscribeRSS Feed
Amanda_Lemon
Quartz | Level 8

I am trying to simulate nested data but so far I couldn't figure out how to do that... So, I want to simulate 2-level data, e.g., students (level 1) clustered within schools (level 2). The model to be simulated is a random intercept model where level-2 variable (W) predicts level-1 outcome (Y). 

 

Untitled.jpg

 

 

 

 

 

I would like to specify intercept variance, residual variance, number of level 2 units (schools), and number of level-1 units per level-2 unit (number of students per school). 

 

Any help would be highly appreciated! Thank you in advance. 

13 REPLIES 13
PGStats
Opal | Level 21

What have you tried so far?  What would the set of Wj values be in this simulation?

PG
Amanda_Lemon
Quartz | Level 8

I found a code in the "SAS for mixed models" book by Littell, 2006 (Section 12.5) -- is it allowed to copy-paste code from a book? Anyway, it has more variance parameters than I need and does not have numbers of units. So I wasn't sure how to modify that code for my purpose. 

 

W would be just a normally distributed variable (so, a normal distribution of j cases).

Amanda_Lemon
Quartz | Level 8
@Rick_SAS, maybe you have some hints for me about how to simulate this model? Thank you in advance!
Rick_SAS
SAS Super FREQ

SAS Press books provide the code for free, so you ought to be able to go to the web site for Littel et al or Wicklin (2013) and use the code, if you can understand it. 

 

My hint is to simulate the beta0[j] first, then simulate the Y[i,j] for each cluster. 

 

If you are using the DATA step, put the beta0[j] in an array and simulate them first. Then for each cluster and subject in the cluster, simulate Y[i,j] and OUTPUT the result.

 

If you are using IML, then you can vectorize the process. You can generate the beta0 vector in one statement and then loop over the clusters and generate all subjects in a vector way.

Amanda_Lemon
Quartz | Level 8

@Rick_SAS, thank you! I tried to adapt the code from your book. Did I do it right? Does this code simulate the model I described in the first post? U think it does but just want to double check... The end goal is to run a power analysis. Thanks again!

 

%let mu = 0; /* intercept, or overall mean */ 
%let tau = 15; /* random intercept variance */
%let sigma = 45; /* error variance */ 
%let N2 = 200; /* number of level 2 units */ 
%let N1 = 30; /* number of level 1 units per level 2 unit */ 
%let b = 0.2; /* effect of W */ 
%let nsamples = 1000; /* number of simulated samples */ 

data power;
call streaminit(123456);
do sampleID = 1 to &nsamples; 
 do j = 1 to &N2;
   rand_int = rand('Normal', 0, sqrt(&tau)); 
   W = rand('Normal', 0, 1);
   do i = 1 to &N1;
      rand_e = rand('Normal', 0, sqrt(&sigma));
      Y = &mu + &b * W + rand_int + rand_e;
      output;
   end; 
 end; 
end; 
run; 
Rick_SAS
SAS Super FREQ

I'm sure you understand that I cannot double check the programs for everyone who reads my books or blogs. 

 

In your simulation, you need to make sure to specify the meaning of the W variable. In your simulation, it looks like W is a random effect because it assumes different values for each subject in each sample. This is one (valid) model. Another valid model is to assume that W is a fixed effect. In a fixed effect, the W[j] would be the same for each sample. You would have to move the SAMPLE loop to just outside the loop over i (individual units). If you do that, you probably want to sort by SAMPLE before you perform the BY-group analysis of the simulated data. These issues are discussed on p. 199-200 and p. 230-231.

 

I suspect you intend W to be a fixed effect because you use b as the size of the effect as opposed to assuming W ~ N(0, s) for some variance component.

Amanda_Lemon
Quartz | Level 8

@Rick_SAS, thank you so much! Understood. Yes, W should be a fixed effect. 

 

I ran into a different problem now... Instead of simulating the level-2 variable W, I wanted to import values of this variable from an existing data set. But the program won't run... Why is that? I also tried to put W values into an array first but that didn't work either...

 

%let mu = 0; /* intercept, or overall mean */ 
%let tau = 15; /* variance of random intercept */
%let sigma = 45; /* error variance */ 
%let N1 = 300; /* number of level 1 units per level 2 unit */ 
%let b = 0.3; /* effect of W */ 
%let nsamples = 2; /* number of simulated samples */ 

data power;
call streaminit(123456); 
set my_set; /* this set has W (200 values) */ 
do j = 1 to 200;
   rand_int = rand('Normal', 0, sqrt(&tau)); 
   do sampleID = 1 to &nsamples; 
     do i = 1 to &N1;
       rand_e = rand('Normal', 0, sqrt(&sigma));
       Y = &mu + &b * W + rand_int + rand_e;
       output;
     end; 
   end; 
end; 
run; 

 

Rick_SAS
SAS Super FREQ

> But the program won't run... Why is that?

 

If a program does not run, please show the error that you get in the log. 

 

This kind of simulation is discussed on p. 203 of Simulating Data with SAS (Wicklin, 2013). I think you might have too many looks. Remember that the SET statement implicitly loops over the data.

Amanda_Lemon
Quartz | Level 8

Well, that's the thing -- there is no error, there is quite literally nothing... Seems that it's just running without any result. So I need to force SAS to close and then open it again. 

 

The online version of the book, provided by my university's library, doesn't have page numbers... Do you mind noting the section/subsection number that you were referring to? 

 

Thank you again for your help. 

Rick_SAS
SAS Super FREQ

Section 11.3.2 "A Linear Model Based on Real Data" starts on p. 202

Amanda_Lemon
Quartz | Level 8

Thank you! So, as far as I understood, examples in this section use parameters obtained from the analysis of real data. But I have actual data that I want to use in the simulation rather than parameters. So, in other words, I want to import Level-2 variable values and simulate Level-1 variable values. You mentioned that the problem might be in the SET statement -- is there another way to import values (especially if there are many observations, like 2000)? 

Rick_SAS
SAS Super FREQ

> So, as far as I understood, examples in this section use parameters obtained from the analysis of real data. But I have actual data that I want to use in the simulation rather than parameters

 

See the examples in section 11.3.2.2 "Simulations That Use the Sample Data," which use the SET statement and read in actual data.

Amanda_Lemon
Quartz | Level 8

Got it! Thank you. The problem is solved -- I just needed to move the SET statement inside the first loop. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1419 views
  • 0 likes
  • 3 in conversation