Solved: Re: Vectorize Do Loop to increase performance

bernhard · Posted 02-06-2014 07:31 AM

I am trying to vectorize the code below.

Can somebody help me to avoid the two do loops?

You can ignore the code before the do loops, that is simply to generate the data.

proc iml; /*Generate Data*/ data=J(5,9,0); call randseed(1); call randgen(data, "Normal"); dim=nrow(data); obs=ncol(data); weights=J(obs,obs,0); call randgen(weights, "Normal"); mat=j(dim,dim,0); /*I would like to vectorize this loop*/ do j=1 to obs; do i=1 to obs; x=data[,i]-data[,j]; mat=mat+weights[i,j]*x*t(x); end; end; quit;

Message was edited by: bernhard I (rather Rick Wicklin actually) noticed an error in the code. In fact I need to go through the full inner loop up to obs.

Rick_SAS · Posted 02-12-2014 10:26 AM

Sorry that I didn't see that you revised your original code.

Yes, Ian's method will work, although it is slightly more efficient to use the # operator to avoid forming the diagonal matrix:

do j=1 to p;

x=data-data[,j];

mat=mat+x*(weights[,j]#t(x));

end;

(For an explanation, see Converting between correlation and covariance matrices - The DO Loop )

If you determine that the weights are positive, the you can use

do j=1 to p;

x=(data-data[,j])#sqrt(t(weights[,j]));

mat=mat + x*x`;

end;

View solution in original post

Rick_SAS · Posted 02-06-2014 08:15 AM

1) What are you trying to accomplish statistically? For example, is this a weighted OLS regression or some other standard statistical analysis?

2) You have obs=ncol(data) and dim=nrow(data). Usually the rows of the data matrix contain the observations and the columns contain the variables, but you seem to have reversed that convention.

3) The program contains the comment "of course the loop could go to obs." I think vectorizing will be easier if the inner loop is complete, but when I do that (and remove the line mat=mat+t(mat)), I don't get the same answer. Can you write the complete inner loop?

4) For vectorization, there are basic principles that often help. See "How to vectorize computations in a matrix language" If you answer 1-3, I think it will be easier to apply these prinicples.

bernhard · Posted 02-06-2014 12:09 PM

Hi Rick

Thank you. I have read several very interesting posts on your Blog.

1. I am trying to calculate the gradient of a more complex function. I do not want to post it here though. Maybe I simply need to do some more thinking to see the simplification.

2. You are right about that. This has been so form the beginning in my code. I unfortunately started with a transposed matrix. Although it is only a notation. I do not want to change all my code.

3. In fact I had an error there. I need to do the full loop.

4. I have seen that post, thank you.

I realize that the above answers will probably not help you to answer my original question. I need to do some more thinking or maybe post the full function which I want to differentiate. Although I believe it is possible that my program cannot be simplified.

Rick_SAS · Posted 02-06-2014 12:57 PM

You can get rid of the inner loop. I haven't figured out whether the outer loop can be simplified. The WEIGHTS matrix and the data[,i]-data[,j] computation make each computation dependent on a loop variable.

So is this is some sort of finite difference computation, and that's why we have data[,i]-data[,j]?

Can we assume that the WEIGHTS matrix is positive? Your example fills it with random normal values, but that isn't usually the case for a matrix of that name.

bernhard · Posted 02-11-2014 05:31 AM

Can you please tell me how to avoid the inner loop?

I am investigating a Mahalanobis distance and try to minimize an estimation error depending on the input matrix of the Mahalanobis matrix.

The weights matrix is not positive.

Rick_SAS · Posted 02-11-2014 09:31 AM

Sure, I was waiting for you to post the correct code with the full inner loop (#3 above).

If you are computing a Mahalanibos-type matrix, you might be interested in this article: How to compute Mahalanobis distance in SAS - The DO Loop It presents several tricks for efficiently computing weighted distance matrices.

bernhard · Posted 02-12-2014 02:49 AM

I have edited the original post (i.e. the code section) a few days ago. Here is the loop

do j=1 to obs;

do i=1 to obs;

x=data[,i]-data[,j];

mat=mat+weights[i,j]*x*t(x);

end;

Thanks for the link I will have a look at that post.

IanWakeling · Posted 02-12-2014 04:09 AM

I think you can get rid of the inner loop by doing this:

do j = 1 to obs;

x = data - data[, j];

mat = mat + x * diag(weights[ ,j]) * t(x);

end;

But I can't see any way to avoid the outer loop.

Rick_SAS · Posted 02-12-2014 10:26 AM

Sorry that I didn't see that you revised your original code.

Yes, Ian's method will work, although it is slightly more efficient to use the # operator to avoid forming the diagonal matrix:

do j=1 to p;

x=data-data[,j];

mat=mat+x*(weights[,j]#t(x));

end;

(For an explanation, see Converting between correlation and covariance matrices - The DO Loop )

If you determine that the weights are positive, the you can use

do j=1 to p;

x=(data-data[,j])#sqrt(t(weights[,j]));

mat=mat + x*x`;

end;

Registration is open