Correlate Existing Variables

SASaholic629 · Posted 04-09-2019 03:39 PM

I know how to create correlated random draws using PROC IML. However that performs the random drawing and the correlating all at the same time.

I already have my random draws (from another program). I'd like to take these existing random draws and correlate them. To be more specific, I want to "un-correlate" them. They need to have zero correlation and be completely independent.

BACKGROUND: I have 20 sets of correlated random variables (n=20,000). I use these to simulate 20,000 scenarios and find various VaR, CTE, etc. type statistics. I want to compare those statistics to the same (seed) set of random variables that are completely independent of each other. The idea is to determine the benefit of using correlated random variables to independent ones. Again, it has to be using the same seed, which is why I need to use my existing random draws rather than generating brand new ones. I don't want any noise from using a different seed.

PaigeMiller · Posted 04-09-2019 03:45 PM

I already have my random draws (from another program). I'd like to take these existing random draws and correlate them. To be more specific, I want to "un-correlate" them. They need to have zero correlation and be completely independent.

Principal Components Analysis transforms the variables so that there is no correlation. If that's not it, I don't know what you mean.

--
Paige Miller

SASaholic629 · Posted 04-09-2019 04:15 PM

Do you mind showing me using this simple example? How would I transform these 3 variables using PCA?

DATA Have;
CALL STREAMINIT(123);
DO Scenario = 1 to 100;
	Var1 = RAND("Normal", 0);
	Var2 = RAND("Normal", 0);
	Var3 = RAND("Normal", 0);
	OUTPUT;
END;
RUN;

Rick_SAS · Posted 04-09-2019 04:21 PM

In your example, the data are independent. I thought you were starting with correlated data?

SASaholic629 · Posted 04-09-2019 04:24 PM

I am starting with correlated data. I was just trying to do something quick. Wasn't sure it would matter what we used in the example to start with.

Appreciate your info on inverse Cholesky decomposition. I'll give that a try and report back. Thanks again!

PaigeMiller · Posted 04-09-2019 04:43 PM

@SASaholic629 wrote:

Do you mind showing me using this simple example? How would I transform these 3 variables using PCA?
DATA Have;
CALL STREAMINIT(123);
DO Scenario = 1 to 100;
	Var1 = RAND("Normal", 0);
	Var2 = RAND("Normal", 0);
	Var3 = RAND("Normal", 0);
	OUTPUT;
END;
RUN;

PROC PRINCOMP takes your original variables and produces "Scores" (that's what they are called in most texts), but in the OUT= output data set are usually labelled PRIN1, PRIN2, PRIN3, etc. that are uncorrelated.

Example here: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_princomp_examples03.htm&docsetVer...

--
Paige Miller

Rick_SAS · Posted 04-09-2019 04:20 PM

Are your data correlated multivariate normal? Yes, you can use the inverse Cholesky decomposition to uncorrelate the variables.

I solved the problem in the blog post, but here is an example with p=20 and N=20,000.

proc iml;
call randseed(1);
/* create correlated test data */
p = 20; N = 20000;
Sigma = toeplitz(p:1);
mean = 1:p;
X = randnormal(N, mean, Sigma);

/* Use inverse Cholesky to uncorrelate:
https://blogs.sas.com/content/iml/2012/02/08/use-the-cholesky-transformation-to-correlate-and-uncorrelate-variables.html
*/
S = cov(X);
U = root(S);
L = U`;                /* convenience: use U` and X` */
Z = trisolv(4, L, X`); /* faster than inv(L)*X` */
 
corr = corr(Z`);
print corr;

Theoretically, you should use the population mean and covariance to uncorrelate, but with large N, the sample mean and cov will be very close to the population parameters.

PaigeMiller · Posted 04-09-2019 04:38 PM

In the paper you show, the inverse Cholesky doesn't completely make the variables independent, the covariance between the two variables in 0.0062815. While this is close to zero, and might be good enough for most purposes (although there's no guarantee it will be this close to zero in other data sets), Principal Components guarantees that the correlation (or covariance) between the two transformed variables is exactly zero in theory (and in practice are within roundoff error from zero).

--
Paige Miller

SASaholic629 · Posted 04-15-2019 01:37 PM

Rick,

I've tested this, and it definitely works to uncorrelate my variables. However, it does not get me back to the starting point. In the example from your blog, the first xy that is created using and independent N(0,1) is very different than the resulting xy when you uncorrelate zw using inverse Cholesky Decomposition. While they are both uncorrelated, the results are very different. Essentially that feels no different than if I were to just pick a different seed for the random generator? Right?

PaigeMiller · Posted 04-15-2019 01:48 PM

And what happens if you use Principal Components?

--
Paige Miller

Rick_SAS · Posted 04-15-2019 01:52 PM

No, I don't think it is a random random thing. I think the problem is that uncorrelated data is not unique. For example, if X and Y are uncorrelated with unit variance, then so are the variables (X-Y)/2 and (X+y)/2. In fact, you can define any rotation matrix and the rotated variables remain uncorrelated.

I don't know what "back to the starting point" means, but I think you are saying that if you use one method to correlate variables and another to uncorrelate them, you might not obtain the original variables. That is true. You would need knowledge of how the variables were correlated and then use that knowledge to apply the inverse transformation.

SASaholic629 · Posted 04-15-2019 01:58 PM

That all makes sense. Thanks.

The correlated variables came from different software, so I have no knowledge of how they were correlated. The idea was to compare downstream results of uncorrelated vs correlated. But the caveat is that we want to use the same seed, to avoid any added noise from using a different seed. It sounds like we'll need to take a different approach then.

Thanks again

Rick_SAS · Posted 04-15-2019 02:07 PM

Before you give up, you might try the TRANSPOSE of the Cholesky matrix.

If that doesn't work then

1. use whatever software you want to generate uncorrelated random normal variates.

2. In any software you want, find the Cholesky root of the covariance matrix.

3. Use the Cholesky root to correlate the variables.

You know have correlated MVN data and you know how it was constructed. You can use it in any software.

Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

Re: Correlate Existing Variables

SAS Innovate 2025: Call for Content