Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-09-2019 03:39 PM
(832 views)

I know how to create correlated random draws using PROC IML. However that performs the random drawing and the correlating all at the same time.

I already have my random draws (from another program). I'd like to take these existing random draws and correlate them. To be more specific, I want to "un-correlate" them. They need to have zero correlation and be completely independent.

BACKGROUND: I have 20 sets of correlated random variables (n=20,000). I use these to simulate 20,000 scenarios and find various VaR, CTE, etc. type statistics. I want to compare those statistics to the same (seed) set of random variables that are completely independent of each other. The idea is to determine the benefit of using correlated random variables to independent ones. Again, it has to be using the same seed, which is why I need to use my existing random draws rather than generating brand new ones. I don't want any noise from using a different seed.

12 REPLIES 12

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I already have my random draws (from another program). I'd like to take these existing random draws and correlate them. To be more specific, I want to "un-correlate" them. They need to have zero correlation and be completely independent.

Principal Components Analysis transforms the variables so that there is no correlation. If that's not it, I don't know what you mean.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you mind showing me using this simple example? How would I transform these 3 variables using PCA?

```
DATA Have;
CALL STREAMINIT(123);
DO Scenario = 1 to 100;
Var1 = RAND("Normal", 0);
Var2 = RAND("Normal", 0);
Var3 = RAND("Normal", 0);
OUTPUT;
END;
RUN;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In your example, the data are independent. I thought you were starting with correlated data?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am starting with correlated data. I was just trying to do something quick. Wasn't sure it would matter what we used in the example to start with.

Appreciate your info on inverse Cholesky decomposition. I'll give that a try and report back. Thanks again!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@SASaholic629 wrote:

Do you mind showing me using this simple example? How would I transform these 3 variables using PCA?

`DATA Have; CALL STREAMINIT(123); DO Scenario = 1 to 100; Var1 = RAND("Normal", 0); Var2 = RAND("Normal", 0); Var3 = RAND("Normal", 0); OUTPUT; END; RUN;`

PROC PRINCOMP takes your original variables and produces "Scores" (that's what they are called in most texts), but in the OUT= output data set are usually labelled PRIN1, PRIN2, PRIN3, *etc*. that are uncorrelated.

Example here: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_princomp_examples03.htm&docsetVer...

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are your data correlated multivariate normal? Yes, you can use the inverse Cholesky decomposition to uncorrelate the variables.

I solved the problem in the blog post, but here is an example with p=20 and N=20,000.

```
proc iml;
call randseed(1);
/* create correlated test data */
p = 20; N = 20000;
Sigma = toeplitz(p:1);
mean = 1:p;
X = randnormal(N, mean, Sigma);
/* Use inverse Cholesky to uncorrelate:
https://blogs.sas.com/content/iml/2012/02/08/use-the-cholesky-transformation-to-correlate-and-uncorrelate-variables.html
*/
S = cov(X);
U = root(S);
L = U`; /* convenience: use U` and X` */
Z = trisolv(4, L, X`); /* faster than inv(L)*X` */
corr = corr(Z`);
print corr;
```

Theoretically, you should use the population mean and covariance to uncorrelate, but with large N, the sample mean and cov will be very close to the population parameters.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In the paper you show, the inverse Cholesky doesn't completely make the variables independent, the covariance between the two variables in 0.0062815. While this is close to zero, and might be good enough for most purposes (although there's no guarantee it will be this close to zero in other data sets), Principal Components guarantees that the correlation (or covariance) between the two transformed variables is exactly zero in theory (and in practice are within roundoff error from zero).

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rick,

I've tested this, and it definitely works to uncorrelate my variables. However, it does not get me back to the starting point. In the example from your blog, the first xy that is created using and independent N(0,1) is very different than the resulting xy when you uncorrelate zw using inverse Cholesky Decomposition. While they are both uncorrelated, the results are very different. Essentially that feels no different than if I were to just pick a different seed for the random generator? Right?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

And what happens if you use Principal Components?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

No, I don't think it is a random random thing. I think the problem is that uncorrelated data is not unique. For example, if X and Y are uncorrelated with unit variance, then so are the variables (X-Y)/2 and (X+y)/2. In fact, you can define any rotation matrix and the rotated variables remain uncorrelated.

I don't know what "back to the starting point" means, but I think you are saying that if you use one method to correlate variables and another to uncorrelate them, you might not obtain the original variables. That is true. You would need knowledge of how the variables were correlated and then use that knowledge to apply the inverse transformation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That all makes sense. Thanks.

The correlated variables came from different software, so I have no knowledge of how they were correlated. The idea was to compare downstream results of uncorrelated vs correlated. But the caveat is that we want to use the same seed, to avoid any added noise from using a different seed. It sounds like we'll need to take a different approach then.

Thanks again

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Before you give up, you might try the TRANSPOSE of the Cholesky matrix.

If that doesn't work then

1. use whatever software you want to generate uncorrelated random normal variates.

2. In any software you want, find the Cholesky root of the covariance matrix.

3. Use the Cholesky root to correlate the variables.

You know have correlated MVN data and you know how it was constructed. You can use it in any software.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.