I need to generate random values for two beta-distributed variables that are correlated. The two variables of interest are characterized as follows:
----
X1 has mean = 0.896 and variance = 0.001.
X2 has mean = 0.206 and variance = 0.004.
For X1 and X2, p = 0.5, where p is the correlation coefficient.
----
I understand how to generate a random number specifying a beta distribution using the function X = RAND('BETA', a, b), where a and b are the two shape parameters for a variable X that can be calculated from the mean and variance. However, I want to generate values for both X1 and X2 simultaneously while specifying that they are correlated at p = 0.5.
Looks like a question for Rick Wicklin
This is a duplicate of the same question asked on StackOverflow.
Run the SAS/IML program on p. 166 of Simulating Data with SAS, but substiture the Beta distribution for the Gamma and Exponential variables that appear in the book.
Thanks Rick--this is the solution I came to yesterday. Thanks for producing such a fantastic resource.
What values did you get for alpha_1, beta_1 and alpha_2, beta_2?
data corr_vars;
input x1 var1 x2 var2; *var1 and var2 are the variances for x1 and x2;
a_x1 = ((1 - x1) / var1 - 1/ x1) * x1**2;
a_x2 = ((1 - x2) / var2 - 1/ x2) * x2**2;
b_x1 = a_x1 * (1 / x1 - 1);
b_x2 = a_x2 * (1 / x2 - 1);
datalines;
0.896 0.001 0.207 0.004
;
proc print data = corr_vars;
run;
Therefore:
alpha1 = 82.597
beta1 = 9.587
alpha2 = 8.289
beta2 = 31.750
Then, here is the code I used to generate the correlated rates based on the book chapter:
proc iml;
call randseed(12345);
N = 10000; *number of random variable sets to generate;
Z = RandNormal(N, {0, 0}, {1 0.5, 0.5 1}); *RandNormal(N, Mean, Cov);
U = cdf("Normal", Z);
x1_beta = quantile('BETA', U[,1], 82.597, 9.587);
x2_beta = quantile('BETA', U[,2], 8.289, 31.750);
X = x1_beta || x2_beta; *here are my correlated variables, beta-distributed;
rhoZ = corr(Z)[1,2]; *check correlations;
rhoX = corr(X)[1,2];
print X;
print rhoZ rhoX;
If you are interested in Pearson correlations, the correlation for the MVN data is not exactly the correlation that you want for the beta variables. It needs to be modified, as explained on p. 167. For this example, I think you want to use
rho = 0.5105
for the MVN data in order to have the correlation ot the beta variables be 0.5.
For Spearman (rank) correlations, no adjustment is necessary.
Rick,
I am trying to apply the solution here to a slightly different problem: to generated auto-correlated time-series of say 24 intervals (24 hours in a day) with each interval following a Weibull distribution (of different parameters across hours). My understanding is that each value depends on the last one so they have to be generated in sequence. Any thoughts?
Thanks,
Bo
Yes, I think if you have a different question then you should start a new thread instead of appending to a thread from 2015.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.