Solved: Simulating normal and Bernoulli multivariate distribution from correla...

SimRock · Posted 11-27-2018 10:31 PM

Hi, I have been trying to simulate some correlated data in SAS Proc IML but I am having some troubles. Any help is appreciated. I saw the post by Rick Wicklin (Rick_SAS). I tried to modify the code provided here: https://communities.sas.com/t5/Statistical-Procedures/Generate-correlated-random-variables-that-foll...

but I have only had little success. Any help would be greatly appreciated. Here's my code below. By I am having some error messages.

I am trying to simulate the variable Age (continuous) and Male (binary) using their means, standard deviation, and covariance as seen in below

SAS Output

Covariance Matrix, DF = 529 AGE MALEAGEMALE

0.0689125085	0.0015515212
0.0015515212	0.2501159183

SAS Output

Simple StatisticsVariable N Mean Std Dev Sum Minimum MaximumAGEMALE

530	11.23208	0.26251	5953	10.50000	12.00000
530	0.48113	0.50012	255.00000	0	1.00000

SAS Output

Pearson Correlation Coefficients, N = 530Prob > |r| under H0: Rho=0 AGE MALEAGEMALE

1.00000

0.01182

0.7861

0.01182

0.7861

1.00000

Here's the code I was trying to write but there some errors

proc iml;
	call randseed(1);
	mu_age=11.23208;
  	sigma_age=0.26251;
 	p_male=0.48113;
	N = 1000;
	Mean = {11.23208,	0.48113};
	Cov = 	{0.0689125085	0.0015515212, 
			0.0015515212	0.2501159183};


  Z = RandNormal(N, Mean, Cov);
  

  U = cdf("Normal", Z);  

  Age= quantile("normal", U[,1], mu_age, sigma_age);

  Male= quantile("binomial", U[,2], p_male,1);

  mean_Age=mean(Age);
  std_Age=std(Age);
  mean_Male=mean(Male);
  covZ = cov(Z);
  covZ1 =cov(Age, Male);
  
 print mean_Age std_Age, mean_Male covZ covz1;
quit;

I am planning on getting the book Simulating data with by Rick Wicklin, but I am hoping someone could help me in the meantime

Thank you

Rick_SAS · Posted 11-29-2018 10:08 PM

In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.

I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.

See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuous-discrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:

proc iml;
	call randseed(1);
	mu_age=11.23208;
  	sigma_age=0.26251;
 	p_male=0.48113;
	N = 1000;

/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1    0.37,  /* the intermediate correlation matrix must be inflated if you use Pearson corr */
     0.37 1};
Z = randnormal(N, {0 0}, R);

X1 = mu_age + sigma_age*Z[,1];           /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]);               /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1 || B2;

mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;

View solution in original post

Reeza · Posted 11-27-2018 10:42 PM

Try this reference instead: https://blogs.sas.com/content/iml/2013/04/10/generate-multiple-mvn-samples.html

SimRock · Posted 11-28-2018 12:35 PM

Thank you, Reeza. But I couldn't find how to simulate the variables from the quantile distribution

Ksharp · Posted 11-29-2018 09:23 AM

@Rick_SAS wrote a blog about simulating data from its quantiles.

Here is my code .Also check Rick's blog.


%let n=10000;
proc iml;
/* quantiles of total cholesterol from NHANES study
http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html
 */
Quantile = {0 , .01, .05, .10, .25, .50, .75, .90, .95, .99, 1};
Estimate = {80, 128, 151, 162, 184, 209, 236, 265, 284, 333, 727};
n=nrow(Quantile);
bin_value=j(n-1,1);
bin_width=j(n-1,1);
p=j(n-1,1);
do i=1 to n-1;
 bin_value[i]=Estimate[i];
 bin_width[i]=Estimate[i+1]-Estimate[i];
 p[i]=Quantile[i+1]-Quantile[i];
end;
print Quantile Estimate bin_value bin_width p;
call randseed(123456789);
_bin_value=sample(bin_value,&n,'replace',p);
_bin_width=j(&n,1);
do j=1 to &n;
 _bin_width[j]=bin_width[loc(bin_value=_bin_value[j])];
end;
eps=randfun(&n,'uniform');
value=_bin_value`+ceil(_bin_width#eps);
call histogram(value) label="Simulation" ;
call qntl(SimEst, value, Quantile);
print Quantile[F=percent6.] Estimate SimEst[F=5.1];
quit;

SimRock · Posted 11-30-2018 04:16 PM

Thank you Ksharp

Rick_SAS · Posted 11-29-2018 10:08 PM

In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.

I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.

See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuous-discrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:

proc iml;
	call randseed(1);
	mu_age=11.23208;
  	sigma_age=0.26251;
 	p_male=0.48113;
	N = 1000;

/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1    0.37,  /* the intermediate correlation matrix must be inflated if you use Pearson corr */
     0.37 1};
Z = randnormal(N, {0 0}, R);

X1 = mu_age + sigma_age*Z[,1];           /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]);               /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1 || B2;

mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;

SimRock · Posted 11-30-2018 04:15 PM

Thank you so much, Rick. It worked like a charm.

Two follow-up questions:

1) Is there a particular reason why we should use the correlation matrix instead of the covariance matrix for these types of copula simulations especially since the argument for RANDNORMAL (N, Mean, Cov ) asks for a covariance? Any clarification would be great

2) In the subsequent simulations after generating the randnormal, could we interchange the Z[,1] and Z[,2] or does SAS know that the first Z[,1] is for Age and Z[,2] for the other variable Male?

Thank you so much

Rick_SAS · Posted 11-30-2018 10:15 PM

1. No reason, other than my preference.

2. You can use the first column to simulate age, if you prefer.

SimRock · Posted 12-02-2018 12:52 AM

Thank you very much, Rick

Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Re: Simulating normal and Bernoulli multivariate distribution from correlated data in Proc IML

Registration is open