BookmarkSubscribeRSS Feed
ginak
Quartz | Level 8

Hello!

I have a lower triangular correlation matrix matrix in a data set. I need to turn this lower triangular correlation matrix (it is in a data set right) into a square symmetric correlation matrix. I was wondering the best way to do this? I tried using the following code, but unfortunately it just prints the symmetric data set to my output whereas, I need to print it to a new data set. I could also write a do loop; I have 19 variables, xGT1 through xGT19. Basically

I want xGT1 through xGT19 , so a 19x19 matrix. How could I do this? Because it's a correlation matrix, I just have 1's on the diagonal. Below is the code I used, using the sqrvech command. Any help is appreciated.

Thank you soooo much!

proc iml;

use pcorr1;

read all var _NUM_ into Dist;

close pcorr1;

d = vech(Dist) ;

Dist = sqrvech(d) ;

print Dist[format=7.5

i was thinking of doing something like

data new; set old;

     array gt {19} xgt1 - xgt19;

     for i = 1 to 19;

     for j = 1 to 19;

     if xgt{i,j} = . then xgt{i,j} = xgt{j,i};

but I am not sure.... here I don't have two indices for i and j, I just have one index.

thanks,Gina

11 REPLIES 11
PGStats
Opal | Level 21

Use transpose and update. I assume your data is a typical TYPE=CORR dataset :

data have;
length _TYPE_ _NAME_ $8;
array c{4};
_TYPE_ = "CORR";
do i = 1 to dim(c);
     call missing(of c{*});
     _NAME_ = vname(c{i});
     do j = 1 to i-1;
          c{j} = 0.1*(i-j);
          end;
     c{i} = 1;
     output;
     end;
drop i j;
run;

proc transpose data=have out=havet; copy _TYPE_; run;

data want;
update have havet;
by _TYPE_ _NAME_;
run;

PG

Message was edited by: PG - Added the code in red .

PG
JohnKeighley
Obsidian | Level 7

I'm not an IML expert but wouldn't this work within IML?

 

p=ncol(Dist);  * Number of columns in the lower triangular correlation matrix;
R=J(p,p,0);    * Initialize the correlation matrix;
Distt=(Dist`); * Transpose the lower triangular matrix to upper;
R=Dist<>Distt; * Select the max of the lower and upper matrices;
print R;       * Complete correlation matrix;

Tom
Super User Tom
Super User

If you want to make a correlation matrix for your data use PROC CORR.  It will make NOT make just the lower triangle.

proc corr data=sashelp.class out=corr ;
run;
Obs    _TYPE_    _NAME_        Age     Height     Weight

 1      MEAN                13.316     62.337    100.026
 2      STD                  1.493      5.127     22.774
 3      N                   19.000     19.000     19.000
 4      CORR     Age         1.000      0.811      0.741
 5      CORR     Height      0.811      1.000      0.878
 6      CORR     Weight      0.741      0.878      1.000

If your triangle "matrix" is small enough just load it into a 2D temporary array.  Just make the temporary array at least as large as your largest triangle.

data have ;
  input  Age Height Weight;
cards;
1.00000  .       .
0.81143 1.00000  .
0.74089 0.87779  1
;

data want ;
  if 0 then set have ;
  array vars _numeric_;
  array m [100,100] _temporary_;
  do row=1 to nobs;
    set have nobs=nobs;
    do col=1 to nobs ;
      m[row,col] = vars[col];
    end;
  end;
  do row=1 to nobs;
    do col=1 to row ;
      vars[col]=m[row,col];
    end;
    do col=row+1 to nobs ;
      vars[col] = m[col,row];
    end;
    output;
  end;
  drop row col;
run;

 

PaigeMiller
Diamond | Level 26

I need to turn this lower triangular correlation matrix (it is in a data set right) into a square symmetric correlation matrix. I was wondering the best way to do this?

Since you are in PROC IML, don't use a DATA step! IML was meant to handle matrices. DATA steps were not designed to handle matrices.

 

In IML

new_dist=dist + dist`; /* This is your square correlation matrix, but with 2 (not 1) on the diagonal */
where = loc(new_dist=2); /* Find the location of the 2 */
new_dist[where]=1; /* Replace 2 with 1 */

 

I tried using the following code, but unfortunately it just prints the symmetric data set to my output whereas, I need to print it to a new data set.

 

PROC IML lets you turn matrices into SAS data sets. Use the CREATE and APPEND statements in PROC IML. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=imlug&docsetTarget=imlug_...

 

 

--
Paige Miller
JohnKeighley
Obsidian | Level 7

I agree with Paige's answer with one warning. It won't work if the upper diagonal elements are missing instead of zero. My code would work with missing values but my code wouldn't work with negative correlations. Which are very common. I experimented with some data this morning to understand how they would work or not work. I have attached my complete code below. I ran two examples, one using a data file from proc corr. The second example uses creates the matrix in the proc iml step.

 


dm 'log;clear;output;clear;odsresults;clear; ';

proc corr data=sashelp.cars out=out1 noprint; * Create a correlation matrix;
run;

proc sql; * Create macro variable with number of numeric variables;
select count(name) as N
into : varcnt
from dictionary.columns
where libname='WORK' and memname='OUT1' and type="num";
quit;

proc sql; * Create list of numeric variables that can be used to create char vector in IML;
select '"'||strip(name)||'"'
into : varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='OUT1' and type="num";
quit;

data out1(drop=_type_ j);
set out1;

where _type_='CORR';
array corrs(&varcnt.) _numeric_;

do j=1 to &varcnt.; * change to upper triangular including the diagonal in this example;
if j>_n_ then corrs(j)=.;
end;
run;

proc print data=out1; * Is it upper triangular;
run;

proc iml;
cnames={&varlist.};
use out1(drop=_name_);
read all into dist;

idx=loc(Dist=.); * If the upper triangular values are missing then set them to zero;
Dist[idx]=0;
print "Print matrix after changing missing values to zero",dist;

new_dist=dist + dist`; /* This is your square correlation matrix, but with 2 (not 1) on the diagonal */
where = loc(new_dist=2); /* Find the location of the 2 */
new_dist[where]=1; /* Replace 2 with 1 */
* I wanted to run the other reply as I indicated I'm not an IML expert;
* This code works well as long as the upper triangular elements aren't equal to missing;

p=ncol(Dist); * Number of columns in the lower triangular correlation matrix;
R=J(p,p,0); * Initialize the correlation matrix;
Distt=(Dist`); * Transpose the lower triangular matrix to upper;
R=Dist<>Distt; * Select the max of the lower and upper matrices;

* I like my code but it's wrong. Correlations can be negative be the other solution will have an error;
* if the missing elements are equal to missing.;

print "Correlation Matrices", new_dist[colname=cnames], R[colname=cnames];
* You can see my suggestion doesn't work for negative correlations;

create CorrMat from new_dist[colname=cnames];
append from new_dist;
show contents;
close CorrMat;
quit;

proc print data=CorrMat;
run;

proc iml;

dist={1.0000000 . . . . . . . . . ,
0.9991316 1.0000000 . . . . . . . . ,
0.571753 0.564498 1.0000000 . . . . . . . ,
0.6497419 0.6452261 0.9080023 1.0000000 . . . . . . ,
0.826945 0.8237465 0.7874349 0.8103406 1.0000000 . . . . . ,
-0.47502 -0.470442 -0.709471 -0.684402 -0.676699 1.0000000 . . . . ,
-0.439622 -0.434585 -0.717302 -0.6761 -0.647195 0.9410205 1.0000000 . . . ,
0.4484264 0.4423322 0.8078667 0.7422087 0.6307958 -0.737966 -0.790989 1.0000000 . . ,
0.1520001 0.1483275 0.636517 0.5467305 0.3873978 -0.507284 -0.524661 0.7607028 1.0000000 . ,
0.1720368 0.1665864 0.6374482 0.5477827 0.3815539 -0.501526 -0.466092 0.6900207 0.8891947 1.0000000}; * I don't know the dimensions or values of the original matrix so ;
* Use the matrix from the cars file;
print "Print original matrix as it was entered",dist;

idx=loc(Dist=.); * If the upper triangular values are missing then set them to zero;
Dist[idx]=0;
print "Print matrix after changing missing values to zero",dist;

new_dist=dist + dist`; /* This is your square correlation matrix, but with 2 (not 1) on the diagonal */
where = loc(new_dist=2); /* Find the location of the 2 */
new_dist[where]=1; /* Replace 2 with 1 */
* I wanted to run the other reply as I indicated I'm not an IML expert;

p=ncol(Dist); * Number of columns in the lower triangular correlation matrix;
R=J(p,p,0); * Initialize the correlation matrix;
Distt=(Dist`); * Transpose the lower triangular matrix to upper;
R=Dist<>Distt; * Select the max of the lower and upper matrices;

LastName = "X" + strip(char(p,4)); * What is the last variabale in the Matrix? 1, 2, ...,p;
* First time I saw this was from Rick Wicklin blog;
VarNames = "X1":LastName; * Create a character vector of variable names;

print "Correlation Matrices after updates", new_dist[colname=VarNames], R[colname=VarNames];
* You can see that both methods work for changing from a lower triangular to full correlation matrix;

create CorrMat2 from new_dist[colname=VarNames];
append from new_dist;
show contents;
close CorrMat2;
quit;

proc print data=CorrMat2;
run;

PaigeMiller
Diamond | Level 26

All good points by @JohnKeighley 

 

There would be no need to discuss what would work and what would not work, if the OP @ginak would just show us a portion of this lower triangular matrix.

--
Paige Miller
JohnKeighley
Obsidian | Level 7
I agree, but I am learning more about IML and I was reading in the data from Johnson & Wichern's 6th edition text and making example programs for my students. That data had missing values for the correlations in the cells above the diagonal. That was why I spent extra time on it.
PaigeMiller
Diamond | Level 26
But where would the missings come from? I certainly don't think I have seen SAS produce a correlation matrix with missing in the upper trangle. Of course, I haven't used every SAS proc, maybe some of them produce just a lower trangle with missing upper triangle, but I would be surprised.
--
Paige Miller
JohnKeighley
Obsidian | Level 7

The *.dat files from Johnson & Wichern don't have values above the diagonal so I used the truncover option to read the data and passed that into iml. So the values were missing. Not zero. But the obs didn't come from a SAS Proc.

 

Tom
Super User Tom
Super User

If you are sure the upper (or lower) triangle is all missing then just use the MIN or MAX operator with a transposed matrix to replicate the values to both triangles.

data lower_triangle ;
  input x1-x4;
cards;
 1  .  .  . 
.1  1  .  .
.2 .3  1  .
.4 .5 .6  1
;

proc iml;
  use lower_triangle;
  read all var _num_ into a;
  b= a <> a` ;
  print b;
quit;

Result

                  b

        1       0.1       0.2       0.4
      0.1         1       0.3       0.5
      0.2       0.3         1       0.6
      0.4       0.5       0.6         1

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1782 views
  • 1 like
  • 6 in conversation