For 10,000 rows, both the loop and the vectorized methods have approximately the same run time. On my laptop, the looping method takes 0.007 s and the vectorized method takes 0.004 s. On the one hand, the vectorized method is almost twice as fast. On the other hand, you save only 0.003 s by using it.
If your matrix has 100,000 or more rows, then the performance comparison is:
100,000 rows: the looping method takes 0.07 s and the vectorized method takes 0.04 s.
1,000,000 rows: the looping method takes 0.7 s and the vectorized method takes 0.4 s.
Here is the program I use, so you can run your own experiments:
/* Rick's original version: Loop over rows and apply each shift */
%let NCOL = 15;
%let NROW = 1000000;
proc iml;
nr = &NROW;
p = &NCOL;
A = shape(1:(nr*p), nr, p);
call randseed(1234);
v = randfun(nr, "Uniform", 0, p-1); /* create shift vector */
nRep = 10; /* repeat the experiment nRep times and report average */
/* use MOD function to apply a cyclic shift of row A[i,]
by the number of columns in v[i] */
t0 = time();
do rep = 1 to nRep;
idx = 1:p;
ShiftA = j(nr, p, .); /* allocate */
do i = 1 to nr;
shiftIdx = p - v[i] + idx; /* shift to the right */
newIdx = mod(shiftIdx-1, p) + 1; /* wrap around */
ShiftA[i,] = A[i, newIdx];
end;
end;
tLoop = (time() - t0) / nRep;
/* yabwon's vectorized version */
t0 = time();
do rep = 1 to nRep;
c = ncol(A);
r = nrow(A);
idx = repeat(1:c,r);
row = t(0:r-1)*c;
shiftIdx = c - v + idx;
newIdx = mod(shiftIdx-1, c) + 1;
want = shape(A[newIdx + row], r);
end;
tVectorized = (time() - t0) / nRep;
print tLoop tVectorized;
print (max(abs(ShiftA - want)))[L="Diff between answers"];
... View more