BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SimRock
Obsidian | Level 7

Hi, I have been trying to simulate some correlated data in SAS Proc IML but I am having some troubles. Any help is appreciated. I saw the post by Rick Wicklin (

 

Covariance Matrix, DF = 529  AGE MALEAGEMALE
0.06891250850.0015515212
0.00155152120.2501159183

Simple StatisticsVariable N Mean Std Dev Sum Minimum MaximumAGEMALE
53011.232080.26251595310.5000012.00000
5300.481130.50012255.0000001.00000

 

SAS Output

Pearson Correlation Coefficients, N = 530Prob > |r| under H0: Rho=0  AGE MALEAGEMALE
1.00000
 
0.01182
0.7861
0.01182
0.7861
1.00000

Here's the code I was trying to write but there some errors

 

proc iml;
	call randseed(1);
	mu_age=11.23208;
  	sigma_age=0.26251;
 	p_male=0.48113;
	N = 1000;
	Mean = {11.23208,	0.48113};
	Cov = 	{0.0689125085	0.0015515212, 
			0.0015515212	0.2501159183};


  Z = RandNormal(N, Mean, Cov);
  

  U = cdf("Normal", Z);  

  Age= quantile("normal", U[,1], mu_age, sigma_age);

  Male= quantile("binomial", U[,2], p_male,1);

  mean_Age=mean(Age);
  std_Age=std(Age);
  mean_Male=mean(Male);
  covZ = cov(Z);
  covZ1 =cov(Age, Male);
  
 print mean_Age std_Age, mean_Male covZ covz1;
quit;

I am planning on getting the book Simulating data with by Rick Wicklin, but I am hoping someone could help me in the meantime

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.

 

I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.

See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuous-discrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:

 

proc iml;
	call randseed(1);
	mu_age=11.23208;
  	sigma_age=0.26251;
 	p_male=0.48113;
	N = 1000;

/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1    0.37,  /* the intermediate correlation matrix must be inflated if you use Pearson corr */
     0.37 1};
Z = randnormal(N, {0 0}, R);

X1 = mu_age + sigma_age*Z[,1];           /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]);               /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1 || B2;

mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;

 

 

View solution in original post

8 REPLIES 8
SimRock
Obsidian | Level 7

Thank you, Reeza. But I couldn't find how to simulate the variables from the quantile distribution

Ksharp
Super User

@Rick_SAS wrote a blog about simulating data from its quantiles.

 

Here is my code .Also check Rick's blog.


%let n=10000;
proc iml;
/* quantiles of total cholesterol from NHANES study
http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html
 */
Quantile = {0 , .01, .05, .10, .25, .50, .75, .90, .95, .99, 1};
Estimate = {80, 128, 151, 162, 184, 209, 236, 265, 284, 333, 727};
n=nrow(Quantile);
bin_value=j(n-1,1);
bin_width=j(n-1,1);
p=j(n-1,1);
do i=1 to n-1;
 bin_value[i]=Estimate[i];
 bin_width[i]=Estimate[i+1]-Estimate[i];
 p[i]=Quantile[i+1]-Quantile[i];
end;
print Quantile Estimate bin_value bin_width p;
call randseed(123456789);
_bin_value=sample(bin_value,&n,'replace',p);
_bin_width=j(&n,1);
do j=1 to &n;
 _bin_width[j]=bin_width[loc(bin_value=_bin_value[j])];
end;
eps=randfun(&n,'uniform');
value=_bin_value`+ceil(_bin_width#eps);
call histogram(value) label="Simulation" ;
call qntl(SimEst, value, Quantile);
print Quantile[F=percent6.] Estimate SimEst[F=5.1];
quit;

 

Rick_SAS
SAS Super FREQ

In your example, the variables are uncorrelated (p=0.7861), so you can use independent normal and Bernoulli variables.

 

I assume, however, you want to see how to simulate the data if the target correlation is larger, such as 0.3.

See Chapter 9 in Simulating Data with SAS (2013), the section on copulas. Although in the chapter I show how to use copulas for continuous distributions, the idea can apply to continuous-discrete distributions. The tricky part (as I discuss in the book) is that if your TARGET correlation is 0.3, you need to generate normal data from an "intermediate" distribution that has a larger correlation, such as 0.37. Here's some code you can play around with:

 

proc iml;
	call randseed(1);
	mu_age=11.23208;
  	sigma_age=0.26251;
 	p_male=0.48113;
	N = 1000;

/* let's try to simulate data that has correlation {1 0.3, 0.3 1} */
R = {1    0.37,  /* the intermediate correlation matrix must be inflated if you use Pearson corr */
     0.37 1};
Z = randnormal(N, {0 0}, R);

X1 = mu_age + sigma_age*Z[,1];           /* scale to N(mu, sigma) */
U2 = cdf("normal", Z[,2]);               /* ~ U(0,1) */
B2 = quantile("Bernoulli", U2, 0.48113); /* ~ Bern(p=0.48) */
X = X1 || B2;

mean = mean(X);
corr = corr(X);
cov = cov(X);
print mean, corr, cov;

 

 

SimRock
Obsidian | Level 7

Thank you so much, Rick. It worked like a charm.

Two follow-up questions:

 

1) Is there a particular reason why we should use the correlation matrix instead of the covariance matrix for these types of copula simulations especially since the argument for RANDNORMAL (N, Mean, Cov )  asks for a covariance? Any clarification would be great

 

2) In the subsequent simulations after generating the randnormal, could we interchange the Z[,1] and Z[,2] or does SAS know that the first Z[,1] is for Age and Z[,2] for the other variable Male?

 

Thank you so much 

Rick_SAS
SAS Super FREQ

1. No reason, other than my preference. 

2. You can use the first column to simulate age, if you prefer.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 8 replies
  • 4737 views
  • 5 likes
  • 4 in conversation