I have read the Dr. Rick's post about how to extract nontrivial correlation from a correlation matrix. http://blogs.sas.com/content/iml/2012/08/16/extract-the-lower-triangular-elements-of-a-matrix.html
My question is that I want to construct a correlation matrix from a vector only containing nontrivial corrleations. The vector only contains those correlations that are not 1. But I want to construct a correlation matrix whose diagonal elements are all 1.
I'd use the ROW and COL functions to specify the upper triangular and lower triangular portions of the matrix:
proc iml;
/* vector contains n*(n-1)/2 upper triangular corr */
v = {0.6 0.5 0.4
0.3 0.2
0.1};
k = ncol(v);
n = (sqrt(1 + 8*k) + 1)/2; /* dimension of full matrix */
print n;
A = J(n,n,0); /* zero matrix */
r = row(A);
c = col(A);
upperTri = loc(r < c); /* upper tri indices in row major order */
A[upperTri] = v; /* copy elements */
A = A + A`; /* make symmetric */
diag = loc(r = c); /* diagonal elements */
A[diag] = 1; /* put 1 on diagonal */
print A;
If you are using an old version of SAS/IML that does not support the ROW and COL functions, see the blog post "Filling the lower and upper triangular portions of a matrix."
I'd use the ROW and COL functions to specify the upper triangular and lower triangular portions of the matrix:
proc iml;
/* vector contains n*(n-1)/2 upper triangular corr */
v = {0.6 0.5 0.4
0.3 0.2
0.1};
k = ncol(v);
n = (sqrt(1 + 8*k) + 1)/2; /* dimension of full matrix */
print n;
A = J(n,n,0); /* zero matrix */
r = row(A);
c = col(A);
upperTri = loc(r < c); /* upper tri indices in row major order */
A[upperTri] = v; /* copy elements */
A = A + A`; /* make symmetric */
diag = loc(r = c); /* diagonal elements */
A[diag] = 1; /* put 1 on diagonal */
print A;
If you are using an old version of SAS/IML that does not support the ROW and COL functions, see the blog post "Filling the lower and upper triangular portions of a matrix."
Hey Rick,
My memory of matrix algebra is non-functional this morning.
I see the code:
n = (sqrt(1 + 8*k) + 1)/2; /* dimension of full matrix */
and wanted to generalize it, but that '8' seems a bit arbitrary, especially when I am going after a 4x4, such that k=4. Then sqrt(33) is a non-integer value. Is there something in the blog post to help me out here?
Steve Denham
Hi Steve,
It's not matrix algebra that you need to recall. The formula is derived from the quadratic formula. The formula is valid for any size matrix. The '8' is not arbitrary.
There are k = n(n-1)/2 elements in the upper triangular portion of an nxn matrix.For this problem, a vector with k elements is specified, so we need to discover n. The formula is equivalent to
n**2 - n -2k = 0
which by the quadratic formula has the solutions
n = ( 1 +/- sqrt(1 + 8k) ) / 2.
We discard the negative solution and are left with
n = (1 + sqrt(1 + 8*k)) / 2
as the formula that gives n when k is properly specified.
So as you see, the '8' comes from the "-4ac" portion of the quadratic formula. Here a=1 and c=-2k. The "trick" that you missed is that k cannot be arbitrary. The formula is only valid when k is the number of elements in the upper/lower triangular portion of an nxn matrix for some n.
Fooled by a picture.
That's what happened. k=ncol(v), but the upper triangular "picture" made me think 3 columns in v, so everything solved in my head. (Plug in k=3, and out comes a 3x3 matrix)
But in fact there are 6 columns in that upper triangular "picture", and so long as k is a triangular number everything works.
Steve Denham
proc iml;
start corr_mat(corr,p);/*p is the number of variables which is the dimension of the objective p*p correlation matrix*/
/*corr is the vector containing all nontrivial correlations in order*/
w=p:2;
v=(1||cusum(w)+1);
do i=1 to ncol(v);
s=v[i];
corr=insert(corr,1,0,s);
end;
corr_mat=sqrvech(corr);
return(corr_mat);
finish;
quit;
Thank you Rick! Would the above code also works? Suppose we know how many variables we want, say p=7 variables, and we have the correlation vectors. Do you think this solution is correct?
Yes, I think it is correct.
However, calling the INSERT function multiple times in a loop is not very efficient because it allocates a new matrix (one element greater than before) and copies over the existing elements. You end up copying the array p times.
As a general rule, it is inefficient to dynamically grow an array insdie of a loop. See the article "Friends don't let friends concatenate results inside a loop," which describes a situation that is similar to your approach.
I just published a blog post that explains how to create a correlation matrix from the uppse triangular elements.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.