Re: reordering correlation matrix

hdg · Posted 04-20-2015 02:29 PM

Hi,

I have a dataset that is a correlation matrix say n x n. In the below example its just a 4x4

A =

[ 1 0.3 0.2 0.5

0.3 1 0.1 -0.7

0.2 0.1 1 -0.6

0.5 -0.7 -0.6 1

]

Therefore there are (4*4-4)/2 = 6 different pairs or (n*(n-1))/2 different pairs excluding diagonal

So I use a random generator in this example to generate a random list list from 1 to 6 ,

for example 5 4 2 1 3 6

so now I want the 1st element to be replaced by the 5th element and 2nd element to be replaced by the 4 th

so the output will look like

A =

[ 1 - 0.7 0.1 0.2

0.7 1 0.3 0.5

0.1 0.3 1 -0.6

0.2 0.5 -0.6 1

]

Thanks!

PaigeMiller · Posted 04-20-2015 02:41 PM

This would take some programming in PROC IML, I think you could start with the VECH function, remove the diagonal terms, do the switching of elements, and re-form the entire matrix.

But, no, I don't have any actual code to do this.

--
Paige Miller

art297 · Posted 04-20-2015 02:43 PM

Is your data in IML? If so, you really should post this in the IML forum.

Otherwise, please show us (in a data step) how your matrix is being stored.

art297 · Posted 04-20-2015 05:41 PM

You never responded regarding whether you were using IML.

Regardless, I found the question interesting and answered based on the matrix actually being the type of matrix returned from proc corr. If that is what you have, here is one way to rearrange the file:

/*Create example data and run proc corr*/

data class;

set sashelp.class;

if sex='M' then gender=1;

else gender=0;

run;

proc corr data=class nomiss outp=CorrOutp (where=(_type_ eq 'CORR'));

var gender age height weight;

run;

/*restructure according to randomly assigned numbers*/

data corroutp;

set corroutp;

if _n_ eq 1 then n=4;

else if _n_ eq 2 then n=2;

else if _n_ eq 3 then n=1;

else if _n_ eq 4 then n=3;

proc sort data=corroutp;

by n;

run;

data want;

retain _type_ _name_ height age weight gender;

set corroutp (drop=n);

run;

PaigeMiller · Posted 04-21-2015 08:14 AM

Hi

If I understand this original request properly, he wants to main that "structure" of the correlation matrix and have ones on the diagonal, and swapping the value in an individual cell with the value in a different individual cell. I don't see your solution doing that, it seems to simply re-order the rows and there are no longer a value of 1 on the diagonal.

This original request doesn't seem to be easily programmed in a data step, if it can be programmed at all. Perhaps multiple data steps could make it work (as has done) but it is something for which IML is the perfect tool.

I am curious why this is needed, I can't think of a mathematical or statistical reason to do this, and in fact, swapping the value in one cell with the value in another cell might make the result into something that is not a valid correlation matrix.

--
Paige Miller

art297 · Posted 04-21-2015 04:25 PM

: I agree with you that correctly addressed the original request and I hadn't.

However, I took your comment ("This original request doesn't seem to be easily programmed in a data step, if it can be programmed at all.") as a challenge.

With the help of one format, the task wasn't really that difficult to accomplish using a data step:

data corroutp;

input var1 var2 var3 var4;

cards;

1 .3 .2 .5

.3 1 .1 -.7

.2 .1 1 -.6

.5 -.7 -.6 1

;

proc format;

value o_order

1=1.2

2=1.3

3=1.4

4=2.3

5=2.4

6=3.4

;

run;

data want (keep=var:);

set corroutp end=last;

array vars(*) var1-var4;

array have(4,4);

array want(4,4);

retain have: want:;

array test(4,4);

do i=1 to 4;

have(_n_,i)=vars(i);

if i eq _n_ then want(i,i)=vars(i);

end;

if last then do;

j=0;

do i=5,4,2,1,3,6; *<- random order;

j+1;

want(int(put(j,o_order.)),fuzz(10*(put(j,o_order.)-int(put(j,o_order.)))))=

have(int(put(i,o_order.)),fuzz(10*(put(i,o_order.)-int(put(i,o_order.)))));

want(fuzz(10*(put(j,o_order.)-int(put(j,o_order.)))),int(put(j,o_order.)))=

have(fuzz(10*(put(i,o_order.)-int(put(i,o_order.)))),int(put(i,o_order.)));

end;

do i=1 to 4;

do j=1 to 4;

vars(j)=want(i,j);

end;

output;

end;

run;

PaigeMiller · Posted 04-21-2015 04:29 PM

Hmm, that's impressive! I stand corrected. However, I still think this part of my statement remains correct: "This original request doesn't seem to be easily programmed in a data step" emphasis on the word easily, despite your opinion that it wasn't that difficult, I don't think most people can program at the Tabachneck level of competency. And yes, that is a run-on sentence!

--
Paige Miller

PGStats · Posted 04-20-2015 06:29 PM

A solution that's a bit longish, but it works for any matrix size :

data A;

input a1-a4;

datalines;

1 0.3 0.2 0.5

0.3 1 0.1 -0.7

0.2 0.1 1 -0.6

0.5 -0.7 -0.6 1

;

data al;

call streaminit(7685);

set a end=done;

array a{*} _numeric_;

do i = _n_+1 to dim(a);

corr = a{i};

rnk = rand("UNIFORM");

output;

end;

if done then call symputx("n", _n_);

keep rnk corr;

run;

proc sort data=al; by rnk; run;

data b;

l + 1;

_line_ = l;

_col_ = l;

corr = 1;

output;

do c = l+1 to &n;

set al;

_line_ = l; _col_ = c;

output;

_line_ = c; _col_ = l;

output;

end;

if l=&n then stop;

keep _line_ _col_ corr;

run;

proc sort data=b; by _line_ _col_; run;

proc transpose data=b out=want(drop=_:) prefix=a;

by _line_;

var corr;

id _col_;

run;

proc print data=want noobs; run;

PG

Ksharp · Posted 04-22-2015 08:19 AM

If you want change the value of cell randomly , Why not just directly change these variable's order , that would be easy than that .

And I really think it is a IML problem ,not data step problem, Post it at IML forum would be better . Rick might have some good idea.

Xia Keshan

Rick_SAS · Posted 04-22-2015 09:22 AM

To do what you asked: extract the upper triangular elements of the matrix (not including the diagonal), permute them, and stick the permuted values back into the upper AND lower portion of the matrix. In PROC IML, it might look like this:

proc iml;
A = {1.0 0.3 0.2   0.5,
     0.3 1.0 0.1 -0.7,
     0.2 0.1 1.0 -0.6,
     0.5 -0.7 -0.6   1.0 };
upperIdx = loc(col(A) > row(A));
v = A[upperIdx];

/* To generate a random permutation
call randseed(123);
order = ranperm(nrow(v));
*/

order = {5 4 2 1 3 6}; /* permutation of upper triangular elements */
v = v[order]; /* permuted values */

n = nrow(A);
B = j(n,n,0);            /* create zero matrix */
B[upperIdx] = v;         /* insert permuted values */
NewA = B + B` + I(n);    /* create full matrix */
print NewA;

However, I advise you to consider whether you actually want to perform this computation. In many cases, the resulting matrix is not going to be a valid correlation matrix for the original data. It seems to me that what you SHOULD be interested in is the set of correlation matrices that result from permuting the order of your variables. If you decide to pursue this reformulated question, write back and I'll show you how to do it.

PaigeMiller · Posted 04-22-2015 10:26 AM

upperIdx = loc(col(A) > row(A));
v =  A[upperIdx];

In all the years I have been using PROC IML, I have never thought of this simple way to obtain the upper triangle of a matrix. Thanks!

--
Paige Miller

Rick_SAS · Posted 04-23-2015 05:56 AM

They haven't always existed. I introduced them in SAS/IML 12.3 because without them it is difficult to define banded and other structured matrices. See

http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices.html

PaigeMiller · Posted 04-23-2015 08:16 AM

Rick Wicklin wrote:

They haven't always existed. I introduced them in SAS/IML 12.3 because without them it is difficult to define banded and other structured matrices. See

http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices.html

Ah, okay. That's great to know, and I'm sure I will be using these tools. Thanks again.

--
Paige Miller

Registration is open

SAS Training: Just a Click Away