Solved: Vectorizing Rather Than Looping in IML

Junyong · Posted 01-12-2020 06:21 PM

I have a time-series variable x as follows.

data series;
do t=1 to 5000;
x=rannor(1);
output;
end;
run;

For each t, I want to estimate x's second moment using observations from 1 to t and a normal kernel as follows.

proc iml;
use series;
read all var _all_;
close;
s2=1:5000;
do s=1 to 5000;
s2[s]=pdf("norm",1:s,s,100)*x[1:s]##2/sum(pdf("norm",1:s,s,100));
end;
create series var{t x s2};
append;
close;
quit;

So I load series and apply this recursive loop since the second moment at t requires observations from 1 to t (x[1:s]) as well as the kernel weighting information (pdf("norm",1:s,s,100)). Is there any way to wisely alter this do loop by its equivalent vectorized method? This way takes too much time for a large T>5000. Thanks.

Rick_SAS · Posted 01-15-2020 09:03 AM

It is possible to represent these operations in terms of matrix multiplication, but if you have N observations the matrix will be NxN, which will get big. I suspect the main performance bottleneck is not the loop but the fact that you are evaluating the normal PDF so many times. For N observations, you are evaluating the PDF N(N+1)/2 times. This is unnecessary. You only need to evaluate the PDF at N locations. You can then use the properties of the normal PDF to reuse the values. (You also can compute x^2 once instead of multiple times.)

Let f be the normal PDF with standard deviation sigma=100. let f(x; mu) be the PDF for N(mu, 100) evaluated at x.

You are currently evaluating

s=1 f(1; 1)

s=2 f(1; 2) f(2; 2)

s=3 f(1; 3) f(2; 3) f(3; 3)

s=4 f(1; 4) f(2; 4) f(3; 4) f(4; 4)

...

s=N f(1; N) f(2; N) f(3; N) ... f(N; N)

However, notice that f(1; 1) = f(2; 2) = f(3; 3) = .... = f(N; N).

Similarly, f(1; 2) = f(2; 3) = .... = f(N-1; N).

and f(1; 3) = f(2; 4) = ... = f(N-2; N).

In general, the triangular matrix above is banded because the values depend only on the distance between x and N.

If you reuse the PDF values, you can save a lot of time. On my computer, the original loop for N=8000 takes 3.4 seconds whereas the improved loop takes 0.2 seconds, (Only 5% of the original time.)

%let N = 8000;

data series;
do t=1 to &N;
x=rannor(1);
output;
end;
run;


proc iml;
use series;
read all var _all_;
close;

x2 = x##2;      /* form x^2 one time */

/* original loop */
t0 = time();
s2 = j(1, &N, .);
do s=1 to &N;
   w = pdf("norm",1:s,s,100);
   s2[s] = w * x2[1:s] / sum(w);
end;
tLoop = time() - t0;
s0 = s2;     /* save the oringal values */

/* new loop */
t0 = time();
w0 = pdf("norm", 1:&N, &N, 100);  /* eval the PDF once */
do s = 1 to &N;
   w = w0[ , (&N-s+1):&N];
   s2[s] = w * x2[1:s] / sum(w);
end;
tVec = time() - t0;
print tLoop tVec;

print (max(abs(s0-s2)))[L="Max Diff"];  /* show that the answers are the same */

View solution in original post

Rick_SAS · Posted 01-15-2020 09:03 AM

It is possible to represent these operations in terms of matrix multiplication, but if you have N observations the matrix will be NxN, which will get big. I suspect the main performance bottleneck is not the loop but the fact that you are evaluating the normal PDF so many times. For N observations, you are evaluating the PDF N(N+1)/2 times. This is unnecessary. You only need to evaluate the PDF at N locations. You can then use the properties of the normal PDF to reuse the values. (You also can compute x^2 once instead of multiple times.)

Let f be the normal PDF with standard deviation sigma=100. let f(x; mu) be the PDF for N(mu, 100) evaluated at x.

You are currently evaluating

s=1 f(1; 1)

s=2 f(1; 2) f(2; 2)

s=3 f(1; 3) f(2; 3) f(3; 3)

s=4 f(1; 4) f(2; 4) f(3; 4) f(4; 4)

...

s=N f(1; N) f(2; N) f(3; N) ... f(N; N)

However, notice that f(1; 1) = f(2; 2) = f(3; 3) = .... = f(N; N).

Similarly, f(1; 2) = f(2; 3) = .... = f(N-1; N).

and f(1; 3) = f(2; 4) = ... = f(N-2; N).

In general, the triangular matrix above is banded because the values depend only on the distance between x and N.

If you reuse the PDF values, you can save a lot of time. On my computer, the original loop for N=8000 takes 3.4 seconds whereas the improved loop takes 0.2 seconds, (Only 5% of the original time.)

%let N = 8000;

data series;
do t=1 to &N;
x=rannor(1);
output;
end;
run;


proc iml;
use series;
read all var _all_;
close;

x2 = x##2;      /* form x^2 one time */

/* original loop */
t0 = time();
s2 = j(1, &N, .);
do s=1 to &N;
   w = pdf("norm",1:s,s,100);
   s2[s] = w * x2[1:s] / sum(w);
end;
tLoop = time() - t0;
s0 = s2;     /* save the oringal values */

/* new loop */
t0 = time();
w0 = pdf("norm", 1:&N, &N, 100);  /* eval the PDF once */
do s = 1 to &N;
   w = w0[ , (&N-s+1):&N];
   s2[s] = w * x2[1:s] / sum(w);
end;
tVec = time() - t0;
print tLoop tVec;

print (max(abs(s0-s2)))[L="Max Diff"];  /* show that the answers are the same */

Vectorizing Rather Than Looping in IML

Re: Vectorizing Rather Than Looping in IML

Re: Vectorizing Rather Than Looping in IML

Vectorizing Rather Than Looping in IML

Re: Vectorizing Rather Than Looping in IML

Re: Vectorizing Rather Than Looping in IML

SAS Innovate 2025: Call for Content