I know how to create correlated random draws using PROC IML. However that performs the random drawing and the correlating all at the same time.
I already have my random draws (from another program). I'd like to take these existing random draws and correlate them. To be more specific, I want to "un-correlate" them. They need to have zero correlation and be completely independent.
BACKGROUND: I have 20 sets of correlated random variables (n=20,000). I use these to simulate 20,000 scenarios and find various VaR, CTE, etc. type statistics. I want to compare those statistics to the same (seed) set of random variables that are completely independent of each other. The idea is to determine the benefit of using correlated random variables to independent ones. Again, it has to be using the same seed, which is why I need to use my existing random draws rather than generating brand new ones. I don't want any noise from using a different seed.
I already have my random draws (from another program). I'd like to take these existing random draws and correlate them. To be more specific, I want to "un-correlate" them. They need to have zero correlation and be completely independent.
Principal Components Analysis transforms the variables so that there is no correlation. If that's not it, I don't know what you mean.
Do you mind showing me using this simple example? How would I transform these 3 variables using PCA?
DATA Have;
CALL STREAMINIT(123);
DO Scenario = 1 to 100;
Var1 = RAND("Normal", 0);
Var2 = RAND("Normal", 0);
Var3 = RAND("Normal", 0);
OUTPUT;
END;
RUN;
In your example, the data are independent. I thought you were starting with correlated data?
I am starting with correlated data. I was just trying to do something quick. Wasn't sure it would matter what we used in the example to start with.
Appreciate your info on inverse Cholesky decomposition. I'll give that a try and report back. Thanks again!
@SASaholic629 wrote:
Do you mind showing me using this simple example? How would I transform these 3 variables using PCA?
DATA Have; CALL STREAMINIT(123); DO Scenario = 1 to 100; Var1 = RAND("Normal", 0); Var2 = RAND("Normal", 0); Var3 = RAND("Normal", 0); OUTPUT; END; RUN;
PROC PRINCOMP takes your original variables and produces "Scores" (that's what they are called in most texts), but in the OUT= output data set are usually labelled PRIN1, PRIN2, PRIN3, etc. that are uncorrelated.
Example here: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_princomp_examples03.htm&docsetVer...
Are your data correlated multivariate normal? Yes, you can use the inverse Cholesky decomposition to uncorrelate the variables.
I solved the problem in the blog post, but here is an example with p=20 and N=20,000.
proc iml;
call randseed(1);
/* create correlated test data */
p = 20; N = 20000;
Sigma = toeplitz(p:1);
mean = 1:p;
X = randnormal(N, mean, Sigma);
/* Use inverse Cholesky to uncorrelate:
https://blogs.sas.com/content/iml/2012/02/08/use-the-cholesky-transformation-to-correlate-and-uncorrelate-variables.html
*/
S = cov(X);
U = root(S);
L = U`; /* convenience: use U` and X` */
Z = trisolv(4, L, X`); /* faster than inv(L)*X` */
corr = corr(Z`);
print corr;
Theoretically, you should use the population mean and covariance to uncorrelate, but with large N, the sample mean and cov will be very close to the population parameters.
In the paper you show, the inverse Cholesky doesn't completely make the variables independent, the covariance between the two variables in 0.0062815. While this is close to zero, and might be good enough for most purposes (although there's no guarantee it will be this close to zero in other data sets), Principal Components guarantees that the correlation (or covariance) between the two transformed variables is exactly zero in theory (and in practice are within roundoff error from zero).
Rick,
I've tested this, and it definitely works to uncorrelate my variables. However, it does not get me back to the starting point. In the example from your blog, the first xy that is created using and independent N(0,1) is very different than the resulting xy when you uncorrelate zw using inverse Cholesky Decomposition. While they are both uncorrelated, the results are very different. Essentially that feels no different than if I were to just pick a different seed for the random generator? Right?
And what happens if you use Principal Components?
No, I don't think it is a random random thing. I think the problem is that uncorrelated data is not unique. For example, if X and Y are uncorrelated with unit variance, then so are the variables (X-Y)/2 and (X+y)/2. In fact, you can define any rotation matrix and the rotated variables remain uncorrelated.
I don't know what "back to the starting point" means, but I think you are saying that if you use one method to correlate variables and another to uncorrelate them, you might not obtain the original variables. That is true. You would need knowledge of how the variables were correlated and then use that knowledge to apply the inverse transformation.
That all makes sense. Thanks.
The correlated variables came from different software, so I have no knowledge of how they were correlated. The idea was to compare downstream results of uncorrelated vs correlated. But the caveat is that we want to use the same seed, to avoid any added noise from using a different seed. It sounds like we'll need to take a different approach then.
Thanks again
Before you give up, you might try the TRANSPOSE of the Cholesky matrix.
If that doesn't work then
1. use whatever software you want to generate uncorrelated random normal variates.
2. In any software you want, find the Cholesky root of the covariance matrix.
3. Use the Cholesky root to correlate the variables.
You know have correlated MVN data and you know how it was constructed. You can use it in any software.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.