BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pcur
Fluorite | Level 6

Hi Folks,

 

I'm very new to PROC IML but am attempting to simulate a dataset based on a preexisting correlation matrix. I found a nice macro for this (corr2data) at http://www.ats.ucla.edu/stat/sas/macros/corr2data_demo.htm. I've simplified this slightly for my use, as provided below. 

 

This works well for me but I need to adapt this to simulate hundreds of datasets from the same correlation matrix, and assign a sample ID to each simulated dataset. I used a %Do loop to achieve this, and in combination with a datastep this works, but I'm failing at making a sampleID variable to append to each row of the output dataset. (e.g., so all rows generated in the first iteration are labeled "1", and all rows labeled in the 13th iteration are labeled "13", for example).

 

Here's what I've got so far. Help much appreciated. 

The macro call: %SIMUDATA(simu.Fakenorm, simu.Rmat, 1051); 

 

the macro:

%macro SIMUDATA(outdata, corrmat, n);
%Do index=1 %to 20 %by 1;

proc iml;
use &corrmat;
read all var _num_ into C;
rn = nrow(C);
cn = ncol(C);
p = root(C);
dim = nrow(C);
myvar = rannor(J(&n, dim, 0));
do i = 1 to dim;
myvar[, i] = myvar[,i]-(sum(myvar[,i])/&n);
end;
XX = (t(myvar)*myvar)/(&n-1);
U = root(inv(XX));
Y = myvar*T(U);
T = Y*p;
* S=J(dim, 1, &index);               <----I made this in an attempt to create a vector of nrow length labeled with the index number
* V=insert(T,S,0,35);               <-----I made this to append that to the simulated dataset
create &outdata from V;
append from V;
quit;

data simu.cumu;
set simu.cumu &outdata;
run;

%end;
%mend;

1 ACCEPTED SOLUTION

Accepted Solutions
pcur
Fluorite | Level 6

Much appreciated!

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

This macro is unnecessary. The macro merely generates random multivariate normal samples, which you can do directly in SAS/IML by using the RANDNORMAL function. See the article "Sampling from the multivariate normal distribution."

 

To produce many samples from the same correlation matrix, see the article "How to generate multiple samples from the multivariate normal distribution in SAS."

 

Since you say you are new to SAS/IML, here is the program that incorporates what you've asked for, but PLEASE read the article for background:

data corrmat;
input c1 c2 c3;
datalines;
3 2 1
2 4 0
1 0 2
;

proc iml;
call randseed(4321);               
/* specify population mean and covariance */
Mean = {0, 0, 0};
use corrmat;
read all var _num_ into Cov;
close corrmat;

N = 5;                 /* sample size */
NumSamples = 10;       /* number of samples/replicates */  
 
X = RandNormal(N*NumSamples, Mean, Cov);
ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = ID || X;
create MVN from Z[c={"ID" "x1" "x2" "x3"}];
append from Z;
close MVN;
quit;

When you analyze these samples, be sure to use the BY statement in procedures, and do not write a macro loop. as detailed in the article "Simulation in SAS: The slow way or the BY way."

pcur
Fluorite | Level 6

Hi Rick, 

 

Thanks for your reply!

 

One thing I'm not following in your script is the preservation of the original correlation matrix in the generated sample. The sample correlation matrix im generating from has 34 variables, and I need the correlation pattern in those variables preserved in the generated data. 

 

 

Rick_SAS
SAS Super FREQ

I showed you a 3-variable example. The only change you need to make is to make the population mean (the zero vector) equal to the dimension of your data.  You might also want to capture the original names of the variables and re-use them in the output data set.

 

proc iml;
call randseed(4321);               
/* specify population mean and covariance */
use corrmat;
read all var _num_ into Cov[c=varNames]; /* save var names */
close corrmat;
Mean = j(nrow(Cov),1,0); /* zero vector */

N = 5;                 /* sample size */
NumSamples = 10;       /* number of samples/replicates */  
 
X = RandNormal(N*NumSamples, Mean, Cov);
ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = ID || X;
varNames = "ID" || varNames; /* comncatenate "ID" to var names */
create MVN from Z[c=varNames];
append from Z;
close MVN;
quit;
pcur
Fluorite | Level 6

Much appreciated!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 4 replies
  • 1059 views
  • 0 likes
  • 2 in conversation