Re: Generating multivariate non-normal data in IML

mra · Posted 02-07-2013 11:46 AM

Hi everyone

I wanted to know if there is a convenient way to generate multivariate non-normal (in particular, exponential/uniform) data in IML for a specified covariance structure. I know that the probability integral transform method gives such a result (a simple code for exponential and uniform data follows) but I am not really sure if it really achieves its goal.

v1 = i(p) + ep ; /*a simple compound symmetric sigma used as target*/

sdmat1 = sqrt(vecdiag(v1))` ;

sdinv1 = diag(sdmat1`##(-1)) ;

rmat1 = sdinv1*v1*sdinv1 ; /* sigma converted into corr mat*/

rrmat1 = root(rmat1) ;

***** Generating Data ;

z1 = rannor(j(n1, p, 0)) ; /* generating MVN data */

u1 = probnorm(z1*rrmat1) ; /*converting into uniform distribution(CDF): Prob Integ Tranf method*/

x1 = u1 - 0.5*j(n1, p, 1) ; /*mean-deviated uniform[0, 1] data*/

x1 = -log(1 – u1) – j(n1, p, 1) ; /*mean-deviated exponential(1) data*/

Any help will be highly appreciated.

Thanks.

MRA

Rick_SAS · Posted 02-07-2013 01:49 PM

Because the topic (simulating correlated data with given covariance and marginal distributions) is so broad, there isn't a built-in general purpose approach. However, SAS/IML provides all the tools, as you've shown.

In my forthcoming book, Simulating Data with SAS, I dedicated two chapters to generating data from various multivariate distributions with given correlation structure.

Chapter 8: Simulating Data from Basic Multivariate Distributions, simulates data from basic multivariate distributions such as multinomial, MVnormal, MV t, and mixture distributions. In SAS/IML, the multivaraite distributions that you can call are documented and begin with the prefix RAND*.

Chapter 9: Advanced Simulation of Multivariate Data, includes simulating multivariate binary variables, multivariate ordinal variables, and the mechanism that you are using, which is called a copula approach.

When the book is published (expected publication, April 2013), all of the programs will be made available in a zip file for free download.

In addition to SAS/IML tools, you can use the COPULA procedure in SAS/ETS software to simulate data from copula models. The COPULA procedure supports simulating data with given rank correlation. The Spearmen rank correlation matrix is often close to the Pearson correlations, as long no correlation between variables is too extreme (close to -1 or 1).

bhfield · Posted 01-29-2015 11:43 AM

I examined the RAND link and there does not appear to be a RAND function associated with a continuous uniform distribution.

How would one go about simulating data from a correlated multivariate uniform distribution? For example, how could I replicate the following (taken from SAS website) with a uniform distribution rather than a normal distribution?

I am new to simulation but I feel like there is an inverse CDF technique that could handle it....but I am not getting it yet.

proc iml; call randseed(1); N=1000; Mean = {1 2}; Corr = {0.6 0.5,0.5 0.9}; Var = {4 9};

/*create the covariance matrix*/

Cov = Corr # sqrt(Var` * Var);

x = RANDNORMAL( N, Mean, Cov );

SampleMean = x[:,];

n = nrow(x);

y = x - repeat( SampleMean, n );

SampleCov = y`*y / (n-1);

print SampleMean Mean, SampleCov Cov;

run;

Rick_SAS · Posted 01-29-2015 12:04 PM

RAND is for univariate distributions, as is the inverse CDF technique.

A correlated uniform distribution is called a copula. See p. 164-173 of my book Simulating Data with SAS.

bhfield · Posted 01-29-2015 12:08 PM

Ahhhh! I see! So if I had only realized that a correlated uniform and a copula were the same thing, I wouldn't have looked like such a dolt!

One more minor question - is a correlated uniform a copula regardless of the copula chosen? i.e., regardless of whether guassian, t, etc. is selected?

Thanks again Dr. Wicklin!

Brian

Rick_SAS · Posted 01-29-2015 01:14 PM

Learning the vocabulary in a new field is always challenging, and a copulas are not a common statistical term. I had never heard of them until a few years ago. If my responses are short, it is because I am busy, not because I think the person asking the question is a dolt.

It would have been more correct for me to say "a correlated uniform distribution is a copula for which all of the marginal distributions are trivial." In Sklar's theorem (p. 169), all of the F_i are the identity transformation. In the program on p. 166, X = U, and no inverse CDF (quantile) transformations are necessary. All copula models use correlated uniform variates to model the relationship between variables, so yes, you can use any copula model (but I recommend the normal copula since it is the simplest and fastest.)

Anticipating your next question: How to do this in SAS? If you insist on Pearson correlations, I don't know of any pre-built functions or procedures. If Spearman (rank) corelations are acceptable, then use PROC COPULA as shown on p. 170. You can skip the DATA step (Step 4) on p 171 because no marginal inverse transformations are needed.

bhfield · Posted 01-29-2015 01:22 PM

I hope my self-deprecating humor didn't seem accusatory! That wasn't my intention. You have been tremendously responsive and very helpful.

I think I will need to go with the normal copula....now this opens up the questions of transforming a covariance matrix into an equivalent rank correlation matrix. I hope this is addressed in the text....back to your book for now!

Thanks so much.

Brian

bhfield · Posted 01-29-2015 01:50 PM

Just to clarify once more - here is a simplified version of my exercise. Assume I have 3 loans in a portfolio and each has a static probability of default (PD) assumption. Loan 1 is in industry A and has an expected PD of 5%, Loan 2 is in industry B and has an expected PD of 10% and Loan 3 is in industry C and has a PD of 2.5%. Further, there is some correlation among the loan. Unfortunately, I do not have default histories for the loans, so I can't manually calculate correlations. I can, however, use an industry related covariance matrix produced by a major rating agency. The agency covariance matrix provides covariance information by industry. So I have the covariance information for industries A, B, and C. So, I have a mean vector, which I assume is the vector of the static PD levels and I have an industry based covariance matrix. (Although this introduces a question of variance - there is no connection between the entries on the main daigonal of the covariance matrix to the variances of the individual loans in my portfolio - this could be an issue presumably) For now, let's assume it isn't an issue.

Now I want to simulate the PDs with the given covariance structure.

I tried a multivariate normal approach with the mean vector and the covariance matrix. The code produced valid simulation results, but the simulated values included PDs less than 0 and greater than 1, so this approach can't work.

I then tried a truncated multivariate normal distribution via an accept-reject perspective, meaning I generated results as in step 1 but deleted variates that were outside the desired interval. This approach results in random variates that do not retain the desired correlation structure.

The next potential approach would be to assume multivariate non-normal. With this approach,, I would need to make assumptions about skewness and kurtosis (remember I have no data) and the results will presumably also suffer from the same negative PD and PD > 1 problems as before. Again, if I truncate this approach, the structure is not retained.

So, I am now considering a multivariate uniform distribution with the covariance structure previously noted. I will compare the PD to the uniform variate; if the variate is less than the PD, then I will designate it a default. If the uniform random variate is not less than the PD, the the loan will be treated as non defaulting.

If I use the copula approach, then I would need a Spearman rank correlation structure ..... and I have no idea how to generate this correlation matrix without historical data!

Things are always easy until you get into the weeds.....

Rick_SAS · Posted 01-29-2015 02:18 PM

I see. Considering that your "data" consists of 3 averages and pairwise correlations, I don't think it makes any difference what distributional model you assume...

Unless the Pearson correlations are near +1 or -1, the Spearman and Pearson correlations are often close. Given the other assumptions that you are already making, some people might assume further that the Pearson correlation matrix is a good approximation to the Pearson.

I am not an expert in financial modeling and nothing that I say here should be construed as advice for how to model this problem. I assume that you are already familiar with the dangers of using simulation to model risk of defaults with Gaussian copulas. Good luck.

Rick_SAS · Posted 01-29-2015 01:50 PM

I don't think you can derive the Spearman correlation from the Pearson covariance matrix, but maybe I'm wrong. I'd go back and compute the Spearman correlation of the data, which is the Pearson correlation of the univariate ranks.

Rick_SAS · Posted 01-30-2015 02:44 PM

Actually, I just remembered that for NORMAL data, there is a relationship between the Pearson and Spearman correlations.

rho = 2*sin(pi/6 *S)

where rho is the Pearson product-moment correlation and S is the Spearman rank correlation. Recall that when x is small then sin(x) ~ x, which shows that rho ~ S when S is not too big.

mra · Posted 06-25-2015 05:04 AM

Dear Colleagues

Many thanks for insightful discussion on my question. I am back very late but I haven been following the discussion with interest, and alongside also trying to work on my SAS question further. I think, it goes fine now. Particularly, one can compare the rannor() function with randnormal() and I feel the later one is easier to use and manipulate, even for several non-normal cases.

Since, Rick had already stated about his book which I have now and I find it very helpful.

But, I also have another short question here, and I hope you can help me figure it out much quicker. I usually write my programs in SAS/IML, but this time I started with R and the program works fine, although R's speed for that program is painfully slow. I am writing the same thing in SAS/IML for which I need to compute all possible distinct combinations of size, say k, out of, say n, elements. Simply, I want a SAS function that does exactly what permutations() function does in R's gtools package. I have gone through LEXPERK(), ALLPERM() etc. It seems LEXPERK() is the closest option but it needs the variable list as well which I have but I want to use the list later, First, I only need to make all permutations, say permutations(10, 4) in R's function. Any idea? The other functions do not fit very well into my query - I suppose.

Many thanks and all the best

MRA

Rick_SAS · Posted 06-25-2015 05:55 AM

Look in the SAS/IML documentation under "Combinatorial Functions."

I have written a blog post about how to compute all combinations of n elements taken k at a time.

If your set is not very large (18 items or less), you can also generate all permutations.

Rick_SAS · Posted 06-25-2015 05:56 AM

PS. When you have a new question, please open a new thread instead of reviving an old thread.

Registration is open