turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-06-2016 10:20 AM

Hi Folks,

I'm very new to PROC IML but am attempting to simulate a dataset based on a preexisting correlation matrix. I found a nice macro for this (corr2data) at http://www.ats.ucla.edu/stat/sas/macros/corr2data_demo.htm. I've simplified this slightly for my use, as provided below.

This works well for me but I need to adapt this to simulate hundreds of datasets from the same correlation matrix, and assign a sample ID to each simulated dataset. I used a %Do loop to achieve this, and in combination with a datastep this works, but I'm failing at making a sampleID variable to append to each row of the output dataset. (e.g., so all rows generated in the first iteration are labeled "1", and all rows labeled in the 13th iteration are labeled "13", for example).

Here's what I've got so far. Help much appreciated.

The macro call: %SIMUDATA(simu.Fakenorm, simu.Rmat, 1051);

the macro:

%macro SIMUDATA(outdata, corrmat, n);

%Do index=1 %to 20 %by 1;

proc iml;

use &corrmat;

read all var _num_ into C;

rn = nrow(C);

cn = ncol(C);

p = root(C);

dim = nrow(C);

myvar = rannor(J(&n, dim, 0));

do i = 1 to dim;

myvar[, i] = myvar[,i]-(sum(myvar[,i])/&n);

end;

XX = (t(myvar)*myvar)/(&n-1);

U = root(inv(XX));

Y = myvar*T(U);

T = Y*p;

* S=J(dim, 1, &index); <----I made this in an attempt to create a vector of nrow length labeled with the index number

* V=insert(T,S,0,35); <-----I made this to append that to the simulated dataset

create &outdata from V;

append from V;

quit;

data simu.cumu;

set simu.cumu &outdata;

run;

%end;

%mend;

Accepted Solutions

Solution

05-06-2016
01:17 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-06-2016 01:17 PM

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pcur

05-06-2016 10:41 AM

This macro is unnecessary. The macro merely generates random multivariate normal samples, which you can do directly in SAS/IML by using the RANDNORMAL function. See the article "Sampling from the multivariate normal distribution."

To produce many samples from the same correlation matrix, see the article "How to generate multiple samples from the multivariate normal distribution in SAS."

Since you say you are new to SAS/IML, here is the program that incorporates what you've asked for, but PLEASE read the article for background:

```
data corrmat;
input c1 c2 c3;
datalines;
3 2 1
2 4 0
1 0 2
;
proc iml;
call randseed(4321);
/* specify population mean and covariance */
Mean = {0, 0, 0};
use corrmat;
read all var _num_ into Cov;
close corrmat;
N = 5; /* sample size */
NumSamples = 10; /* number of samples/replicates */
X = RandNormal(N*NumSamples, Mean, Cov);
ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = ID || X;
create MVN from Z[c={"ID" "x1" "x2" "x3"}];
append from Z;
close MVN;
quit;
```

When you analyze these samples, be sure to use the BY statement in procedures, and do not write a macro loop. as detailed in the article "Simulation in SAS: The slow way or the BY way."

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-06-2016 10:48 AM

Hi Rick,

Thanks for your reply!

One thing I'm not following in your script is the preservation of the original correlation matrix in the generated sample. The sample correlation matrix im generating from has 34 variables, and I need the correlation pattern in those variables preserved in the generated data.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pcur

05-06-2016 11:37 AM

I showed you a 3-variable example. The only change you need to make is to make the population mean (the zero vector) equal to the dimension of your data. You might also want to capture the original names of the variables and re-use them in the output data set.

```
proc iml;
call randseed(4321);
/* specify population mean and covariance */
use corrmat;
read all var _num_ into Cov[c=varNames]; /* save var names */
close corrmat;
Mean = j(nrow(Cov),1,0); /* zero vector */
N = 5; /* sample size */
NumSamples = 10; /* number of samples/replicates */
X = RandNormal(N*NumSamples, Mean, Cov);
ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = ID || X;
varNames = "ID" || varNames; /* comncatenate "ID" to var names */
create MVN from Z[c=varNames];
append from Z;
close MVN;
quit;
```

Solution

05-06-2016
01:17 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-06-2016 01:17 PM

Much appreciated!