Hi, I have been trying to simulate some correlated data in SAS Proc IML but I am having some troubles. Any help is appreciated. I saw the post by Rick Wicklin (Rick_SAS). I tried to modify the code provided here: https://communities.sas.com/t5/StatisticalProcedures/Generatecorrelatedrandomvariablesthatfoll...
but I have only had little success. Any help would be greatly appreciated. Here's my code below. By I am having some error messages.
I am trying to simulate the variable Age (continuous) and Male (binary) using their means, standard deviation, and covariance as seen in below
SAS Output
0.0689125085  0.0015515212 
0.0015515212  0.2501159183 
SAS Output
530  11.23208  0.26251  5953  10.50000  12.00000 
530  0.48113  0.50012  255.00000  0  1.00000 
SAS Output

 


Here's the code I was trying to write but there some errors
proc iml;
call randseed(1);
mu_age=11.23208;
sigma_age=0.26251;
p_male=0.48113;
N = 1000;
Mean = {11.23208, 0.48113};
Cov = {0.0689125085 0.0015515212,
0.0015515212 0.2501159183};
Z = RandNormal(N, Mean, Cov);
U = cdf("Normal", Z);
Age= quantile("normal", U[,1], mu_age, sigma_age);
Male= quantile("binomial", U[,2], p_male,1);
mean_Age=mean(Age);
std_Age=std(Age);
mean_Male=mean(Male);
covZ = cov(Z);
covZ1 =cov(Age, Male);
print mean_Age std_Age, mean_Male covZ covz1;
quit;
I am planning on getting the book Simulating data with by Rick Wicklin, but I am hoping someone could help me in the meantime
Thank you
In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.
I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.
See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuousdiscrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:
proc iml;
call randseed(1);
mu_age=11.23208;
sigma_age=0.26251;
p_male=0.48113;
N = 1000;
/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1 0.37, /* the intermediate correlation matrix must be inflated if you use Pearson corr */
0.37 1};
Z = randnormal(N, {0 0}, R);
X1 = mu_age + sigma_age*Z[,1]; /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]); /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1  B2;
mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;
Thank you, Reeza. But I couldn't find how to simulate the variables from the quantile distribution
@Rick_SAS wrote a blog about simulating data from its quantiles.
Here is my code .Also check Rick's blog.
%let n=10000;
proc iml;
/* quantiles of total cholesterol from NHANES study
http://blogs.sas.com/content/iml/2014/06/18/distributionfromquantiles.html
*/
Quantile = {0 , .01, .05, .10, .25, .50, .75, .90, .95, .99, 1};
Estimate = {80, 128, 151, 162, 184, 209, 236, 265, 284, 333, 727};
n=nrow(Quantile);
bin_value=j(n1,1);
bin_width=j(n1,1);
p=j(n1,1);
do i=1 to n1;
bin_value[i]=Estimate[i];
bin_width[i]=Estimate[i+1]Estimate[i];
p[i]=Quantile[i+1]Quantile[i];
end;
print Quantile Estimate bin_value bin_width p;
call randseed(123456789);
_bin_value=sample(bin_value,&n,'replace',p);
_bin_width=j(&n,1);
do j=1 to &n;
_bin_width[j]=bin_width[loc(bin_value=_bin_value[j])];
end;
eps=randfun(&n,'uniform');
value=_bin_value`+ceil(_bin_width#eps);
call histogram(value) label="Simulation" ;
call qntl(SimEst, value, Quantile);
print Quantile[F=percent6.] Estimate SimEst[F=5.1];
quit;
In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.
I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.
See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuousdiscrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:
proc iml;
call randseed(1);
mu_age=11.23208;
sigma_age=0.26251;
p_male=0.48113;
N = 1000;
/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1 0.37, /* the intermediate correlation matrix must be inflated if you use Pearson corr */
0.37 1};
Z = randnormal(N, {0 0}, R);
X1 = mu_age + sigma_age*Z[,1]; /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]); /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1  B2;
mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;
Thank you so much, Rick. It worked like a charm.
Two followup questions:
1) Is there a particular reason why we should use the correlation matrix instead of the covariance matrix for these types of copula simulations especially since the argument for RANDNORMAL
(N, Mean, Cov ) asks for a covariance? Any clarification would be great
2) In the subsequent simulations after generating the randnormal, could we interchange the Z[,1] and Z[,2] or does SAS know that the first Z[,1] is for Age and Z[,2] for the other variable Male?
Thank you so much
1. No reason, other than my preference.
2. You can use the first column to simulate age, if you prefer.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.