Statistical programming, matrix languages, and more

Vectorize Do Loop to increase performance

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Vectorize Do Loop to increase performance

I am trying to vectorize the code below.

Can somebody help me to avoid the two do loops?

You can ignore the code before the do loops, that is simply to generate the data.

proc iml;
/*Generate Data*/
data=J(5,9,0);
call randseed(1);
call randgen(data, "Normal");

dim=nrow(data);
obs=ncol(data);
weights=J(obs,obs,0);
call randgen(weights, "Normal");

mat=j(dim,dim,0);
/*I would like to vectorize this loop*/
do j=1 to obs;
  do i=1 to obs;
  x=data[,i]-data[,j];
  mat=mat+weights[i,j]*x*t(x);
  end; 
end;
quit;

Message was edited by: bernhard  I (rather Rick Wicklin actually) noticed an error in the code. In fact I need to go through the full inner loop up to obs.


Accepted Solutions
Solution
‎02-12-2014 10:26 AM
SAS Super FREQ
Posts: 3,232

Re: Vectorize Do Loop to increase performance

Sorry that I didn't see that you revised your original code.

Yes, Ian's method will work, although it is slightly more efficient to use the # operator to avoid forming the diagonal matrix:

do j=1 to p;

   x=data-data[,j];

   mat=mat+x*(weights[,j]#t(x));

end;

(For an explanation, see Converting between correlation and covariance matrices - The DO Loop )

If you determine that the weights are positive, the you can use

do j=1 to p;

   x=(data-data[,j])#sqrt(t(weights[,j]));

   mat=mat + x*x`;

end;

View solution in original post


All Replies
SAS Super FREQ
Posts: 3,232

Re: Vectorize Do Loop to increase performance

1) What are you trying to accomplish statistically? For example, is this a weighted OLS regression or some other standard statistical analysis?

2) You have obs=ncol(data) and dim=nrow(data).  Usually the rows of the data matrix contain the observations and the columns contain the variables, but you seem to have reversed that convention.

3) The program contains the comment "of course the loop could go to obs." I think vectorizing will be easier if the inner loop is complete, but when I do that (and remove the line mat=mat+t(mat)), I don't get the same answer.  Can you write the complete inner loop?

4) For vectorization, there are basic principles that often help. See "How to vectorize computations in a matrix language"  If you answer 1-3, I think it will be easier to apply these prinicples.

Occasional Contributor
Posts: 15

Re: Vectorize Do Loop to increase performance

Hi Rick

Thank you. I have read several very interesting posts on your Blog.

1. I am trying to calculate the gradient of a more complex function. I do not want to post it here though. Maybe I simply need to do some more thinking to see the simplification.

2. You are right about that. This has been so form the beginning in my code. I unfortunately started with a transposed matrix. Although it is only a notation. I do not want to change all my code.

3. In fact I had an error there. I need to do the full loop.

4. I have seen that post, thank you.

I realize that the above answers will probably not help you to answer my original question. I need to do some more thinking or maybe post the full function which I want to differentiate. Although I believe it is possible that my program cannot be simplified.

SAS Super FREQ
Posts: 3,232

Re: Vectorize Do Loop to increase performance

You can get rid of the inner loop. I haven't figured out whether the outer loop can be simplified. The WEIGHTS matrix and the data[,i]-data[,j] computation make each computation dependent on a loop variable.

So is this is some sort of finite difference computation, and that's why we have data[,i]-data[,j]?

Can we assume that the WEIGHTS matrix is positive? Your example fills it with random normal values, but that isn't usually the case for a matrix of that name.

Occasional Contributor
Posts: 15

Re: Vectorize Do Loop to increase performance

Can you please tell me how to avoid the inner loop?

I am investigating a Mahalanobis distance and try to minimize an estimation error depending on the input matrix of the Mahalanobis matrix.

The weights matrix is not positive.

SAS Super FREQ
Posts: 3,232

Re: Vectorize Do Loop to increase performance

Sure, I was waiting for you to post the correct code with the full inner loop (#3 above).

If you are computing a Mahalanibos-type matrix, you might be interested in this article: How to compute Mahalanobis distance in SAS - The DO Loop It presents several tricks for efficiently computing weighted distance matrices.

Occasional Contributor
Posts: 15

Re: Vectorize Do Loop to increase performance

I have edited the original post (i.e. the code section) a few days ago. Here is the loop


do j=1 to obs;

  do i=1 to obs;

  x=data[,i]-data[,j];

  mat=mat+weights[i,j]*x*t(x);

  end; 

end;


Thanks for the link I will have a look at that post.

Frequent Contributor
Posts: 122

Re: Vectorize Do Loop to increase performance

I think you can get rid of the inner loop by doing this:

  do j = 1 to obs;

    x = data - data[, j];

    mat = mat + x * diag(weights[ ,j]) * t(x);

  end;

But I can't see any way to avoid the outer loop.

Solution
‎02-12-2014 10:26 AM
SAS Super FREQ
Posts: 3,232

Re: Vectorize Do Loop to increase performance

Sorry that I didn't see that you revised your original code.

Yes, Ian's method will work, although it is slightly more efficient to use the # operator to avoid forming the diagonal matrix:

do j=1 to p;

   x=data-data[,j];

   mat=mat+x*(weights[,j]#t(x));

end;

(For an explanation, see Converting between correlation and covariance matrices - The DO Loop )

If you determine that the weights are positive, the you can use

do j=1 to p;

   x=(data-data[,j])#sqrt(t(weights[,j]));

   mat=mat + x*x`;

end;

Post a Question
Discussion Stats
  • 8 replies
  • 799 views
  • 3 likes
  • 3 in conversation