Obsidian | Level 7

## Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Hi, I have been trying to simulate some correlated data in SAS Proc IML but I am having some troubles. Any help is appreciated. I saw the post by Rick Wicklin (

Covariance Matrix, DF = 529  AGE MALEAGEMALE
 0.0689125 0.00155152 0.00155152 0.250116

Simple StatisticsVariable N Mean Std Dev Sum Minimum MaximumAGEMALE
 530 11.2321 0.26251 5953 10.5 12 530 0.48113 0.50012 255 0 1

SAS Output

Pearson Correlation Coefficients, N = 530Prob > |r| under H0: Rho=0  AGE MALEAGEMALE
 1.00000
 0.01182 0.7861
 0.01182 0.7861
 1

Here's the code I was trying to write but there some errors

``````proc iml;
call randseed(1);
mu_age=11.23208;
sigma_age=0.26251;
p_male=0.48113;
N = 1000;
Mean = {11.23208,	0.48113};
Cov = 	{0.0689125085	0.0015515212,
0.0015515212	0.2501159183};

Z = RandNormal(N, Mean, Cov);

U = cdf("Normal", Z);

Age= quantile("normal", U[,1], mu_age, sigma_age);

Male= quantile("binomial", U[,2], p_male,1);

mean_Age=mean(Age);
std_Age=std(Age);
mean_Male=mean(Male);
covZ = cov(Z);
covZ1 =cov(Age, Male);

print mean_Age std_Age, mean_Male covZ covz1;
quit;``````

I am planning on getting the book Simulating data with by Rick Wicklin, but I am hoping someone could help me in the meantime

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.

I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.

See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuous-discrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:

``````proc iml;
call randseed(1);
mu_age=11.23208;
sigma_age=0.26251;
p_male=0.48113;
N = 1000;

/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1    0.37,  /* the intermediate correlation matrix must be inflated if you use Pearson corr */
0.37 1};
Z = randnormal(N, {0 0}, R);

X1 = mu_age + sigma_age*Z[,1];           /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]);               /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1 || B2;

mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;
``````

8 REPLIES 8
Super User

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Obsidian | Level 7

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Thank you, Reeza. But I couldn't find how to simulate the variables from the quantile distribution

Super User

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

@Rick_SAS wrote a blog about simulating data from its quantiles.

Here is my code .Also check Rick's blog.

``````
%let n=10000;
proc iml;
/* quantiles of total cholesterol from NHANES study
http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html
*/
Quantile = {0 , .01, .05, .10, .25, .50, .75, .90, .95, .99, 1};
Estimate = {80, 128, 151, 162, 184, 209, 236, 265, 284, 333, 727};
n=nrow(Quantile);
bin_value=j(n-1,1);
bin_width=j(n-1,1);
p=j(n-1,1);
do i=1 to n-1;
bin_value[i]=Estimate[i];
bin_width[i]=Estimate[i+1]-Estimate[i];
p[i]=Quantile[i+1]-Quantile[i];
end;
print Quantile Estimate bin_value bin_width p;
call randseed(123456789);
_bin_value=sample(bin_value,&n,'replace',p);
_bin_width=j(&n,1);
do j=1 to &n;
_bin_width[j]=bin_width[loc(bin_value=_bin_value[j])];
end;
eps=randfun(&n,'uniform');
value=_bin_value`+ceil(_bin_width#eps);
call histogram(value) label="Simulation" ;
call qntl(SimEst, value, Quantile);
print Quantile[F=percent6.] Estimate SimEst[F=5.1];
quit;``````

Obsidian | Level 7

Thank you Ksharp
SAS Super FREQ

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.

I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.

See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuous-discrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:

``````proc iml;
call randseed(1);
mu_age=11.23208;
sigma_age=0.26251;
p_male=0.48113;
N = 1000;

/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1    0.37,  /* the intermediate correlation matrix must be inflated if you use Pearson corr */
0.37 1};
Z = randnormal(N, {0 0}, R);

X1 = mu_age + sigma_age*Z[,1];           /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]);               /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1 || B2;

mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;
``````

Obsidian | Level 7

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Thank you so much, Rick. It worked like a charm.

Two follow-up questions:

1) Is there a particular reason why we should use the correlation matrix instead of the covariance matrix for these types of copula simulations especially since the argument for `RANDNORMAL` (N, Mean, Cov )  asks for a covariance? Any clarification would be great

2) In the subsequent simulations after generating the randnormal, could we interchange the Z[,1] and Z[,2] or does SAS know that the first Z[,1] is for Age and Z[,2] for the other variable Male?

Thank you so much

SAS Super FREQ

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

1. No reason, other than my preference.

2. You can use the first column to simulate age, if you prefer.

Obsidian | Level 7

## Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Thank you very much, Rick
From The DO Loop