Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-06-2016 10:20 AM
(1085 views)

Hi Folks,

I'm very new to PROC IML but am attempting to simulate a dataset based on a preexisting correlation matrix. I found a nice macro for this (corr2data) at http://www.ats.ucla.edu/stat/sas/macros/corr2data_demo.htm. I've simplified this slightly for my use, as provided below.

This works well for me but I need to adapt this to simulate hundreds of datasets from the same correlation matrix, and assign a sample ID to each simulated dataset. I used a %Do loop to achieve this, and in combination with a datastep this works, but I'm failing at making a sampleID variable to append to each row of the output dataset. (e.g., so all rows generated in the first iteration are labeled "1", and all rows labeled in the 13th iteration are labeled "13", for example).

Here's what I've got so far. Help much appreciated.

The macro call: %SIMUDATA(simu.Fakenorm, simu.Rmat, 1051);

the macro:

%macro SIMUDATA(outdata, corrmat, n);

%Do index=1 %to 20 %by 1;

proc iml;

use &corrmat;

read all var _num_ into C;

rn = nrow(C);

cn = ncol(C);

p = root(C);

dim = nrow(C);

myvar = rannor(J(&n, dim, 0));

do i = 1 to dim;

myvar[, i] = myvar[,i]-(sum(myvar[,i])/&n);

end;

XX = (t(myvar)*myvar)/(&n-1);

U = root(inv(XX));

Y = myvar*T(U);

T = Y*p;

* S=J(dim, 1, &index); <----I made this in an attempt to create a vector of nrow length labeled with the index number

* V=insert(T,S,0,35); <-----I made this to append that to the simulated dataset

create &outdata from V;

append from V;

quit;

data simu.cumu;

set simu.cumu &outdata;

run;

%end;

%mend;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Much appreciated!

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This macro is unnecessary. The macro merely generates random multivariate normal samples, which you can do directly in SAS/IML by using the RANDNORMAL function. See the article "Sampling from the multivariate normal distribution."

To produce many samples from the same correlation matrix, see the article "How to generate multiple samples from the multivariate normal distribution in SAS."

Since you say you are new to SAS/IML, here is the program that incorporates what you've asked for, but PLEASE read the article for background:

```
data corrmat;
input c1 c2 c3;
datalines;
3 2 1
2 4 0
1 0 2
;
proc iml;
call randseed(4321);
/* specify population mean and covariance */
Mean = {0, 0, 0};
use corrmat;
read all var _num_ into Cov;
close corrmat;
N = 5; /* sample size */
NumSamples = 10; /* number of samples/replicates */
X = RandNormal(N*NumSamples, Mean, Cov);
ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = ID || X;
create MVN from Z[c={"ID" "x1" "x2" "x3"}];
append from Z;
close MVN;
quit;
```

When you analyze these samples, be sure to use the BY statement in procedures, and do not write a macro loop. as detailed in the article "Simulation in SAS: The slow way or the BY way."

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

Thanks for your reply!

One thing I'm not following in your script is the preservation of the original correlation matrix in the generated sample. The sample correlation matrix im generating from has 34 variables, and I need the correlation pattern in those variables preserved in the generated data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I showed you a 3-variable example. The only change you need to make is to make the population mean (the zero vector) equal to the dimension of your data. You might also want to capture the original names of the variables and re-use them in the output data set.

```
proc iml;
call randseed(4321);
/* specify population mean and covariance */
use corrmat;
read all var _num_ into Cov[c=varNames]; /* save var names */
close corrmat;
Mean = j(nrow(Cov),1,0); /* zero vector */
N = 5; /* sample size */
NumSamples = 10; /* number of samples/replicates */
X = RandNormal(N*NumSamples, Mean, Cov);
ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = ID || X;
varNames = "ID" || varNames; /* comncatenate "ID" to var names */
create MVN from Z[c=varNames];
append from Z;
close MVN;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Much appreciated!

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.