BookmarkSubscribeRSS Feed
hdg
Obsidian | Level 7 hdg
Obsidian | Level 7

Hi,

I have a dataset that is a correlation matrix say n x n. In the below example its just a 4x4

A =

[ 1   0.3   0.2    0.5

0.3    1    0.1   -0.7

0.2   0.1     1   -0.6

0.5   -0.7  -0.6     1

]

Therefore there are (4*4-4)/2 = 6 different pairs or (n*(n-1))/2 different pairs excluding diagonal

So I use a random generator in this example to generate a random list list from 1 to 6 ,

for example 5 4 2 1 3 6

so now I want the 1st element to be replaced by the 5th element and 2nd element to be replaced by the 4 th

so the output will look like

A =

[ 1  - 0.7   0.1    0.2

0.7    1    0.3   0.5

0.1   0.3     1   -0.6

0.2   0.5  -0.6     1

]

  Thanks!

12 REPLIES 12
PaigeMiller
Diamond | Level 26

This would take some programming in PROC IML, I think you could start with the VECH function, remove the diagonal terms, do the switching of elements, and re-form the entire matrix.

But, no, I don't have any actual code to do this.

--
Paige Miller
art297
Opal | Level 21

Is your data in IML? If so, you really should post this in the IML forum.

Otherwise, please show us (in a data step) how your matrix is being stored.

art297
Opal | Level 21

You never responded regarding whether you were using IML.

Regardless, I found the question interesting and answered based on the matrix actually being the type of matrix returned from proc corr. If that is what you have, here is one way to rearrange the file:

/*Create example data and run proc corr*/

data class;

  set sashelp.class;

  if sex='M' then gender=1;

  else gender=0;

run;

proc corr data=class nomiss outp=CorrOutp (where=(_type_ eq 'CORR'));

   var gender age height weight;

run;

/*restructure according to randomly assigned numbers*/

data corroutp;

  set corroutp;

  if _n_ eq 1 then n=4;

  else if _n_ eq 2 then n=2;

  else if _n_ eq 3 then n=1;

  else if _n_ eq 4 then n=3;

proc sort data=corroutp;

  by n;

run;

data want;

  retain _type_ _name_ height age weight gender;

  set corroutp (drop=n);

run;

PaigeMiller
Diamond | Level 26

Hi

If I understand this original request properly, he wants to main that "structure" of the correlation matrix and have ones on the diagonal, and swapping the value in an individual cell with the value in a different individual cell. I don't see your solution doing that, it seems to simply re-order the rows and there are no longer a value of 1 on the diagonal.

This original request doesn't seem to be easily programmed in a data step, if it can be programmed at all. Perhaps multiple data steps could make it work (as has done) but it is something for which IML is the perfect tool.

I am curious why this is needed, I can't think of a mathematical or statistical reason to do this, and in fact, swapping the value in one cell with the value in another cell might make the result into something that is not a valid correlation matrix.

--
Paige Miller
art297
Opal | Level 21

: I agree with you that correctly addressed the original request and I hadn't.

However, I took your comment ("This original request doesn't seem to be easily programmed in a data step, if it can be programmed at all.") as a challenge.

With the help of one format, the task wasn't really that difficult to accomplish using a data step:

data corroutp;

  input var1 var2 var3 var4;

  cards;

1 .3 .2 .5

.3 1 .1 -.7

.2 .1 1 -.6

.5 -.7 -.6 1

;

proc format;

  value o_order

  1=1.2

  2=1.3

  3=1.4

  4=2.3

  5=2.4

  6=3.4

  ;

run;

data want (keep=var:);

  set corroutp end=last;

  array vars(*) var1-var4;

  array have(4,4);

  array want(4,4);

  retain have: want:;

  array test(4,4);

  do i=1 to 4;

    have(_n_,i)=vars(i);

    if i eq _n_ then want(i,i)=vars(i);

  end;

  if last then do;

    j=0;

    do i=5,4,2,1,3,6; *<- random order;

      j+1;

      want(int(put(j,o_order.)),fuzz(10*(put(j,o_order.)-int(put(j,o_order.)))))=

       have(int(put(i,o_order.)),fuzz(10*(put(i,o_order.)-int(put(i,o_order.)))));

      want(fuzz(10*(put(j,o_order.)-int(put(j,o_order.)))),int(put(j,o_order.)))=

       have(fuzz(10*(put(i,o_order.)-int(put(i,o_order.)))),int(put(i,o_order.)));

    end;

    do i=1 to 4;

      do j=1 to 4;

        vars(j)=want(i,j);

      end;

      output;

    end;

  end;

run;

PaigeMiller
Diamond | Level 26

Hmm, that's impressive! I stand corrected. However, I still think this part of my statement remains correct: "This original request doesn't seem to be easily programmed in a data step" emphasis on the word easily, despite your opinion that it wasn't that difficult, I don't think most people can program at the Tabachneck level of competency. And yes, that is a run-on sentence!

--
Paige Miller
PGStats
Opal | Level 21

A solution that's a bit longish, but it works for any matrix size :

data A;

input a1-a4;

datalines;

1   0.3   0.2    0.5

0.3    1    0.1   -0.7

0.2   0.1     1   -0.6

0.5   -0.7  -0.6     1

;

data al;

call streaminit(7685);

set a end=done;

array a{*} _numeric_;

do i = _n_+1 to dim(a);

    corr = a{i};

    rnk = rand("UNIFORM");

    output;

    end;

if done then call symputx("n", _n_);

keep rnk corr;

run;

proc sort data=al; by rnk; run;

data b;

l + 1;

_line_ = l;

_col_ = l;

corr = 1;

output;

do c = l+1 to &n;

    set al;

    _line_ = l; _col_ = c;

    output;

    _line_ = c; _col_ = l;

    output;

    end;

if l=&n then stop;

keep _line_ _col_ corr;

run;

proc sort data=b; by _line_ _col_; run;

proc transpose data=b out=want(drop=_:) prefix=a;

by _line_;

var corr;

id _col_;

run;

proc print data=want noobs; run;

PG

PG
Ksharp
Super User

If you want change the value of cell randomly , Why not just directly change these variable's order , that would be easy than that .

And I really think it is a IML problem ,not data step problem, Post it at IML forum would be better . Rick might have some good idea.

Xia Keshan

Rick_SAS
SAS Super FREQ

To do what you asked: extract the upper triangular elements of the matrix (not including the diagonal), permute them, and stick the permuted values back into the upper AND lower portion of the matrix.  In PROC IML, it might look like this:

proc iml;
A = {1.0  0.3  0.2   0.5,
     0.3  1.0  0.1  -0.7,
     0.2  0.1  1.0  -0.6,
     0.5 -0.7 -0.6   1.0 };
upperIdx = loc(col(A) > row(A));
v =  A[upperIdx];

/* To generate a random permutation
call randseed(123);
order = ranperm(nrow(v));
*/

order = {5 4 2 1 3 6};   /* permutation of upper triangular elements */
v = v[order];            /* permuted values */

n = nrow(A);
B = j(n,n,0);            /* create zero matrix */
B[upperIdx] = v;         /* insert permuted values */
NewA = B + B` + I(n);    /* create full matrix */
print NewA;

However, I advise you to consider whether you actually want to perform this computation.  In many cases, the resulting matrix is not going to be a valid correlation matrix for the original data.  It seems to me that what you SHOULD be interested in is the set of correlation matrices that result from permuting the order of your variables. If you decide to pursue this reformulated question, write back and I'll show you how to do it.

PaigeMiller
Diamond | Level 26

upperIdx = loc(col(A) > row(A));

v =  A[upperIdx];

In all the years I have been using PROC IML, I have never thought of this simple way to obtain the upper triangle of a matrix. Thanks!

--
Paige Miller
Rick_SAS
SAS Super FREQ

They haven't always existed. I introduced them in SAS/IML 12.3 because without them it is difficult to define banded and other structured matrices.  See

http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices.html

PaigeMiller
Diamond | Level 26

Rick Wicklin wrote:

They haven't always existed. I introduced them in SAS/IML 12.3 because without them it is difficult to define banded and other structured matrices.  See

http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices.html

Ah, okay. That's great to know, and I'm sure I will be using these tools. Thanks again.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1674 views
  • 0 likes
  • 6 in conversation