BookmarkSubscribeRSS Feed
buhl2752
Fluorite | Level 6

Hello,

I am new to proc iml and am trying to simulate multivariate data to run through statistical models and compute power.  Below is very simplified code similar to what I am do - the middle of the program where here I am simulating one variable from a normal distribution would be more complicated and would be simulating multivariate data with a specified correlation.  The part I cannot figure out is how to incorporate the do loops into iml and get out the same type of output I would get from the code below (i.e, columns for run, n, trt, plot, and y) so that I can run my statistical model by run and n and compute power.  Any help would be greatly appreciated. 

 

Also, I am able to simulate multivariate normal data without iml.  I found code to simulate multivariate binomial both with iml and without.  For a multivariate beta distribution I found iml code using copulas.  However, I also need to simulate multivariate count data (both Poisson and negative binomial) but I have not found any code for this and was not sure if copulas could be used to these two distributions.  If anyone knows of some sample code for multivariate Poisson or negative binomial, that would be very helpful.  I am running SAS 9.4 and SAS/IML 15.2.

 

Thank you.  Deb

 

data test;
   do run=1 to 1000;
      do n=2 to 50 by 2;

         do trt=1 to 4;
            do plot=1 to n;
               y=rand("Normal",0,1);
               output;

            end;
         end;
      end;
   end;
run;

3 REPLIES 3
Rick_SAS
SAS Super FREQ

That's a lot of questions. Most of them are addressed in Chapters 11 and 12 of  Simulating Data with SAS

 

The Poisson and negative binomial models are two examples of a general linear model. See the article "Simulate many samples from a logistic regression model," which shows how to simulate a logistic model. You can get other generalized linear models by modifying the statement

call randgen(y, "Bernoulli", mu);    /* 4. simulate binary response */

to instead sample Y from the Poisson or negative binomial distribution.

 

I've written about how to compute a power curve by using the DATA step. To use PROC IML, you can study the example at the end of the article "Use simulation to estimate the power of a statistical test."

You can then add the loops. You can study the code in "Estimate a power curve in parallel in SAS Viya,"

but that article uses parallel computations in SAS Viya. Nevertheless, you can use the general framework of the program in PROC IML in SAS 9, but compute the curve by using serial computations.

 

Here is one specific suggestion: Put the 'n' loop on the outside. The inner loops are then responsible for generating 1000 samples of your data for a given sample size.

 

I hope this helps. If you have specific questions, start a new thread in which you ask ONE question. Include the IML code that you have written so far.

buhl2752
Fluorite | Level 6

Thank you for your reply.  Sorry, but obviously my question was not clear.  My question is the title of this thread:  how do I incorporate do loops into proc iml.  I am simulating data from a particular distribution given a mean and variance, not from a model.  I am simulating correlated data from 2 years from various distributions.  The code for simulating data from a beta distribution is attached.  All of my do loop are written as macro do loops.  I then use proc append to append each run through iml to the base data set.  As usual with macro do loops and proc append, this takes a long time to run.  If the 2 variables did not have to be correlated I can do this easily without using proc iml.  But for a beta (for example) I found code on how to simulate data for 2 correlated variables using proc iml (see attached code).  My question is, is there a way to incorporate those macro do loops into iml so that it runs much faster?  [Or alternatively, is there code to simulated correlated data from a beta distribution (or Poisson or negative binomial) without using proc iml?]

Rick_SAS
SAS Super FREQ

For tips on efficient simulation in SAS, read the paper (or watch the video) "Ten Tips for Simulating Data with SAS," especially the section on avoiding macro loops and PROC APPEND: "Simulation in SAS: The slow way or the BY way."

 

The SAS/IML language supports the iterative DO statement. The syntax is the same as for the DATA step. There is no need to use macro loops.

 

As I said, the article "Simulate many samples from a logistic regression model" provides simulation code similar to what you want to do. You can open the data set for output, run the loops, and output (APPEND) the simulated data for each sample. 

 

My suggestion: Write a module called RandMVBeta that takes the following input parameters, which are the design parameters for your simulation study:

  • runs (number of samples)
  • n (sample size)
  • diff (difference between means in population)
  • rho (correlation of MVN data in population)

The function should return a (runs*n) x 2 matrix of random variates from your correlated bivariate beta distribution. You can then call that function in a loop that runs over the design parameters.

 

You then have two choices:

1. For each call, you output (append) the random values, along with the values of the parameters. At the end of your program, you have written a SAS data set that you can analyze using PROC CORR, PROC MEANS, etc.

2. For each sample, use PROC IML to compute the results by calling the CORR function, the MEAN function, etc. You would then only write (append) the statistics for each sample, not the simulated values.

 

The second option is optimal to speed and efficiency, but all this depends on your proficiency as a SAS/IML programmer.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 3 replies
  • 1014 views
  • 0 likes
  • 2 in conversation