Hello!
I have a lower triangular correlation matrix matrix in a data set. I need to turn this lower triangular correlation matrix (it is in a data set right) into a square symmetric correlation matrix. I was wondering the best way to do this? I tried using the following code, but unfortunately it just prints the symmetric data set to my output whereas, I need to print it to a new data set. I could also write a do loop; I have 19 variables, xGT1 through xGT19. Basically
I want xGT1 through xGT19 , so a 19x19 matrix. How could I do this? Because it's a correlation matrix, I just have 1's on the diagonal. Below is the code I used, using the sqrvech command. Any help is appreciated.
Thank you soooo much!
proc iml;
use pcorr1;
read all var _NUM_ into Dist;
close pcorr1;
d = vech(Dist) ;
Dist = sqrvech(d) ;
print Dist[format=7.5
i was thinking of doing something like
data new; set old;
array gt {19} xgt1 - xgt19;
for i = 1 to 19;
for j = 1 to 19;
if xgt{i,j} = . then xgt{i,j} = xgt{j,i};
but I am not sure.... here I don't have two indices for i and j, I just have one index.
thanks,Gina
Use transpose and update. I assume your data is a typical TYPE=CORR dataset :
data have;
length _TYPE_ _NAME_ $8;
array c{4};
_TYPE_ = "CORR";
do i = 1 to dim(c);
call missing(of c{*});
_NAME_ = vname(c{i});
do j = 1 to i-1;
c{j} = 0.1*(i-j);
end;
c{i} = 1;
output;
end;
drop i j;
run;
proc transpose data=have out=havet; copy _TYPE_; run;
data want;
update have havet;
by _TYPE_ _NAME_;
run;
PG
Message was edited by: PG - Added the code in red .
I'm not an IML expert but wouldn't this work within IML?
p=ncol(Dist); * Number of columns in the lower triangular correlation matrix;
R=J(p,p,0); * Initialize the correlation matrix;
Distt=(Dist`); * Transpose the lower triangular matrix to upper;
R=Dist<>Distt; * Select the max of the lower and upper matrices;
print R; * Complete correlation matrix;
If you want to make a correlation matrix for your data use PROC CORR. It will make NOT make just the lower triangle.
proc corr data=sashelp.class out=corr ;
run;
Obs _TYPE_ _NAME_ Age Height Weight 1 MEAN 13.316 62.337 100.026 2 STD 1.493 5.127 22.774 3 N 19.000 19.000 19.000 4 CORR Age 1.000 0.811 0.741 5 CORR Height 0.811 1.000 0.878 6 CORR Weight 0.741 0.878 1.000
If your triangle "matrix" is small enough just load it into a 2D temporary array. Just make the temporary array at least as large as your largest triangle.
data have ;
input Age Height Weight;
cards;
1.00000 . .
0.81143 1.00000 .
0.74089 0.87779 1
;
data want ;
if 0 then set have ;
array vars _numeric_;
array m [100,100] _temporary_;
do row=1 to nobs;
set have nobs=nobs;
do col=1 to nobs ;
m[row,col] = vars[col];
end;
end;
do row=1 to nobs;
do col=1 to row ;
vars[col]=m[row,col];
end;
do col=row+1 to nobs ;
vars[col] = m[col,row];
end;
output;
end;
drop row col;
run;
I need to turn this lower triangular correlation matrix (it is in a data set right) into a square symmetric correlation matrix. I was wondering the best way to do this?
Since you are in PROC IML, don't use a DATA step! IML was meant to handle matrices. DATA steps were not designed to handle matrices.
In IML
new_dist=dist + dist`; /* This is your square correlation matrix, but with 2 (not 1) on the diagonal */
where = loc(new_dist=2); /* Find the location of the 2 */
new_dist[where]=1; /* Replace 2 with 1 */
I tried using the following code, but unfortunately it just prints the symmetric data set to my output whereas, I need to print it to a new data set.
PROC IML lets you turn matrices into SAS data sets. Use the CREATE and APPEND statements in PROC IML. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=imlug&docsetTarget=imlug_...
I agree with Paige's answer with one warning. It won't work if the upper diagonal elements are missing instead of zero. My code would work with missing values but my code wouldn't work with negative correlations. Which are very common. I experimented with some data this morning to understand how they would work or not work. I have attached my complete code below. I ran two examples, one using a data file from proc corr. The second example uses creates the matrix in the proc iml step.
dm 'log;clear;output;clear;odsresults;clear; ';
proc corr data=sashelp.cars out=out1 noprint; * Create a correlation matrix;
run;
proc sql; * Create macro variable with number of numeric variables;
select count(name) as N
into : varcnt
from dictionary.columns
where libname='WORK' and memname='OUT1' and type="num";
quit;
proc sql; * Create list of numeric variables that can be used to create char vector in IML;
select '"'||strip(name)||'"'
into : varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='OUT1' and type="num";
quit;
data out1(drop=_type_ j);
set out1;
where _type_='CORR';
array corrs(&varcnt.) _numeric_;
do j=1 to &varcnt.; * change to upper triangular including the diagonal in this example;
if j>_n_ then corrs(j)=.;
end;
run;
proc print data=out1; * Is it upper triangular;
run;
proc iml;
cnames={&varlist.};
use out1(drop=_name_);
read all into dist;
idx=loc(Dist=.); * If the upper triangular values are missing then set them to zero;
Dist[idx]=0;
print "Print matrix after changing missing values to zero",dist;
new_dist=dist + dist`; /* This is your square correlation matrix, but with 2 (not 1) on the diagonal */
where = loc(new_dist=2); /* Find the location of the 2 */
new_dist[where]=1; /* Replace 2 with 1 */
* I wanted to run the other reply as I indicated I'm not an IML expert;
* This code works well as long as the upper triangular elements aren't equal to missing;
p=ncol(Dist); * Number of columns in the lower triangular correlation matrix;
R=J(p,p,0); * Initialize the correlation matrix;
Distt=(Dist`); * Transpose the lower triangular matrix to upper;
R=Dist<>Distt; * Select the max of the lower and upper matrices;
* I like my code but it's wrong. Correlations can be negative be the other solution will have an error;
* if the missing elements are equal to missing.;
print "Correlation Matrices", new_dist[colname=cnames], R[colname=cnames];
* You can see my suggestion doesn't work for negative correlations;
create CorrMat from new_dist[colname=cnames];
append from new_dist;
show contents;
close CorrMat;
quit;
proc print data=CorrMat;
run;
proc iml;
dist={1.0000000 . . . . . . . . . ,
0.9991316 1.0000000 . . . . . . . . ,
0.571753 0.564498 1.0000000 . . . . . . . ,
0.6497419 0.6452261 0.9080023 1.0000000 . . . . . . ,
0.826945 0.8237465 0.7874349 0.8103406 1.0000000 . . . . . ,
-0.47502 -0.470442 -0.709471 -0.684402 -0.676699 1.0000000 . . . . ,
-0.439622 -0.434585 -0.717302 -0.6761 -0.647195 0.9410205 1.0000000 . . . ,
0.4484264 0.4423322 0.8078667 0.7422087 0.6307958 -0.737966 -0.790989 1.0000000 . . ,
0.1520001 0.1483275 0.636517 0.5467305 0.3873978 -0.507284 -0.524661 0.7607028 1.0000000 . ,
0.1720368 0.1665864 0.6374482 0.5477827 0.3815539 -0.501526 -0.466092 0.6900207 0.8891947 1.0000000}; * I don't know the dimensions or values of the original matrix so ;
* Use the matrix from the cars file;
print "Print original matrix as it was entered",dist;
idx=loc(Dist=.); * If the upper triangular values are missing then set them to zero;
Dist[idx]=0;
print "Print matrix after changing missing values to zero",dist;
new_dist=dist + dist`; /* This is your square correlation matrix, but with 2 (not 1) on the diagonal */
where = loc(new_dist=2); /* Find the location of the 2 */
new_dist[where]=1; /* Replace 2 with 1 */
* I wanted to run the other reply as I indicated I'm not an IML expert;
p=ncol(Dist); * Number of columns in the lower triangular correlation matrix;
R=J(p,p,0); * Initialize the correlation matrix;
Distt=(Dist`); * Transpose the lower triangular matrix to upper;
R=Dist<>Distt; * Select the max of the lower and upper matrices;
LastName = "X" + strip(char(p,4)); * What is the last variabale in the Matrix? 1, 2, ...,p;
* First time I saw this was from Rick Wicklin blog;
VarNames = "X1":LastName; * Create a character vector of variable names;
print "Correlation Matrices after updates", new_dist[colname=VarNames], R[colname=VarNames];
* You can see that both methods work for changing from a lower triangular to full correlation matrix;
create CorrMat2 from new_dist[colname=VarNames];
append from new_dist;
show contents;
close CorrMat2;
quit;
proc print data=CorrMat2;
run;
All good points by @JohnKeighley
There would be no need to discuss what would work and what would not work, if the OP @ginak would just show us a portion of this lower triangular matrix.
The *.dat files from Johnson & Wichern don't have values above the diagonal so I used the truncover option to read the data and passed that into iml. So the values were missing. Not zero. But the obs didn't come from a SAS Proc.
If you are sure the upper (or lower) triangle is all missing then just use the MIN or MAX operator with a transposed matrix to replicate the values to both triangles.
data lower_triangle ;
input x1-x4;
cards;
1 . . .
.1 1 . .
.2 .3 1 .
.4 .5 .6 1
;
proc iml;
use lower_triangle;
read all var _num_ into a;
b= a <> a` ;
print b;
quit;
Result
b 1 0.1 0.2 0.4 0.1 1 0.3 0.5 0.2 0.3 1 0.6 0.4 0.5 0.6 1
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.