Pyrite | Level 9

## How to Better Vectorize Block Bootstrap?

The raw time-series data are a 636×10 matrix as follows.

``````data have;
portf_me_monthly_2019a.csv" url firstobs=2 dsd;
input year month portfolio number return;
run;

proc transpose prefix=return out=have;
by year month;
id portfolio;
var return;
run;``````

And each block bootstrap samples a 636×10 matrix

(1) starting at a random point between 1 and 636 and

(2) shifting to another random point with 90% probability or if the ending observation (i.e., if the index reaches 636) as follows.

``````proc iml;
use have(keep=return:);
read all var _all_ into a;
t=nrow(a);
n=ncol(a);
call streaminit(1);
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v=rand("integer",1,t);
else if v[u-1]=t then v=v//rand("integer",1,t);
else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
else v=v//v[u-1]+1;
end;
if b=1 then bootstrap=j(t,1)||a[v,];
else bootstrap=bootstrap//(j(t,1,b)||a[v,]);
/*----------------------------------------------------------------------------*/
end;
create bootstrap from bootstrap[c="a"];
append from bootstrap;
quit;``````

For example, the bootstrap picks the random starting point from 1 to 636, say 235, and continues with a 90% probability to 236, 237, 238, ..., 241, and 242 and shifts with a 10% probability to another random point, say 7, and then continues again to 8, 9, 10, etc. until it completes one bootstrap with 636 observations. So the average block length becomes 10.

The bottleneck problem is that these random time indices are therefore somewhat path-dependent and cannot be generated via RAND or RANDGEN. I wonder whether there is a more efficient way to cleverly vectorize this part.

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

Yes. Since you know that the final result will contain 636 rows, allocate pre-allocate arrays of that length. This avoids concatenating inside the loop, which is very inefficient.

See

"Pre-allocate arrays to improve efficiency"

and

"Friends don't let friends concatenate results inside a loop"

You also know that the final bootstrap array will have 2500*636 rows. However, if the only purpose of the outer loop (do b = 1 to 2500) is to get values that you will write to a file, you can save a LOT of time and space by writing each bootstrap sample DIRECTLY to a SAS data set instead of building up the 'bootstrap' array. See "Write to a SAS data set from inside a SAS/IML loop."

9 REPLIES 9
SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

Yes. Since you know that the final result will contain 636 rows, allocate pre-allocate arrays of that length. This avoids concatenating inside the loop, which is very inefficient.

See

"Pre-allocate arrays to improve efficiency"

and

"Friends don't let friends concatenate results inside a loop"

You also know that the final bootstrap array will have 2500*636 rows. However, if the only purpose of the outer loop (do b = 1 to 2500) is to get values that you will write to a file, you can save a LOT of time and space by writing each bootstrap sample DIRECTLY to a SAS data set instead of building up the 'bootstrap' array. See "Write to a SAS data set from inside a SAS/IML loop."

SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

For example, I suspect this code will run much faster:

``````proc iml;
/* Use Sashelp.Air for example */
use Sashelp.Air;
read all var 'Air' into a;
close;
t=nrow(a);
call streaminit(1);

/* allocate and tell output data set how many cols to expect */
bootstrap = j(t,1,.) || a;
create bootstrap from bootstrap[c={"SampleID" "a"}];

v = j(t,1,.);
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v[u]=rand("integer",1,t);
else if v[u-1]=t then v[u]=rand("integer",1,t);
else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
else v[u]=v[u-1]+1;
end;
bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
append from bootstrap;
end;
close;
quit;``````
Pyrite | Level 9

## Re: How to Better Vectorize Block Bootstrap?

``````proc iml;
/* Use Sashelp.Air for example */
use Sashelp.Air;
read all var 'Air' into a;
close;
t=nrow(a);
call streaminit(1);

/* allocate and tell output data set how many cols to expect */
bootstrap = j(t,1,.) || a;
create bootstrap from bootstrap[c={"SampleID" "a"}];

v = j(t,1,.);
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v[u]=rand("integer",1,t);
else if v[u-1]=t then v[u]=rand("integer",1,t);
else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
else v[u]=v[u-1]+1;
end;
bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
append from bootstrap;
end;
close;
quit;``````

Here is the log.

``````1    proc iml;
2       /* Use Sashelp.Air for example */
3        use Sashelp.Air;
4        read all var 'Air' into a;
5       close;
NOTE: Closing SASHELP.AIR
6        t=nrow(a);
7        call streaminit(1);
8
9       /* allocate and tell output data set how many cols to expect */
10      bootstrap = j(t,1,.) || a;
11      create bootstrap from bootstrap[c={"SampleID" "a"}];
12
13      v = j(t,1,.);
14       do b=1 to 2500;
15   /*----------------------------the bottleneck parts----------------------------*/
16           do u=1 to t;
17               if u=1 then v[u]=rand("integer",1,t);
18               else if v[u-1]=t then v[u]=rand("integer",1,t);
19               else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
20               else v[u]=v[u-1]+1;
21           end;
22           bootstrap = j(t,1,b) || a[v,];
23   /*----------------------------------------------------------------------------*/
24          append from bootstrap;
25       end;
26      close;
NOTE: Closing WORK.BOOTSTRAP
NOTE: The data set WORK.BOOTSTRAP has 360000 observations and 2 variables.
27   quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time           1.18 seconds
cpu time            1.17 seconds``````

And the following does not pre-allocate.

``````proc iml;
/* Use Sashelp.Air for example */
use Sashelp.Air;
read all var 'Air' into a;
close;
t=nrow(a);
call streaminit(1);

/* allocate and tell output data set how many cols to expect */
bootstrap = j(t,1,.) || a;
create bootstrap from bootstrap[c={"SampleID" "a"}];

/*   v = j(t,1,.);*/
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v=rand("integer",1,t);
else if v[u-1]=t then v=v//rand("integer",1,t);
else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
else v=v//v[u-1]+1;
end;
bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
append from bootstrap;
end;
close;
quit;``````

And the log.

``````28   proc iml;
29      /* Use Sashelp.Air for example */
30       use Sashelp.Air;
31       read all var 'Air' into a;
32      close;
NOTE: Closing SASHELP.AIR
33       t=nrow(a);
34       call streaminit(1);
35
36      /* allocate and tell output data set how many cols to expect */
37      bootstrap = j(t,1,.) || a;
38      create bootstrap from bootstrap[c={"SampleID" "a"}];
39
40   /*   v = j(t,1,.);*/
41       do b=1 to 2500;
42   /*----------------------------the bottleneck parts----------------------------*/
43           do u=1 to t;
44               if u=1 then v=rand("integer",1,t);
45               else if v[u-1]=t then v=v//rand("integer",1,t);
46               else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
47               else v=v//v[u-1]+1;
48           end;
49           bootstrap = j(t,1,b) || a[v,];
50   /*----------------------------------------------------------------------------*/
51          append from bootstrap;
52       end;
53      close;
NOTE: Closing WORK.BOOTSTRAP
NOTE: The data set WORK.BOOTSTRAP has 360000 observations and 2 variables.
54   quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time           1.17 seconds
cpu time            1.17 seconds``````

It seems there is no performance difference—I also tried 25,000 rather than 2,500 but found no difference.

SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

Right. The actual bottleneck in your program was the concatenation of the 'bootstrap' matrix within the outer loop. In your new test, both programs avoid that concatenation, which is why both programs run much faster than your first version.

Pyrite | Level 9

## Re: How to Better Vectorize Block Bootstrap?

So the takeaway is to use CREATE once and then APPEND repeatedly rather than // repeatedly then CREATE APPEND once—more than the pre-allocation issue (I will check this out in detail later). Thank you!

SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

They are both important. For this example, appending within the loop made a bigger impact on performance.

Pyrite | Level 9

## Re: How to Better Vectorize Block Bootstrap?

It seems the case above isn't good enough to see the performance gain—the following substantially shows the gain instead. Thanks.

``````proc iml;
i=j(500,10);
call randseed(1);
do j=1 to 5000;
call randgen(i,"normal");
if j=1 then k=vech(cov(i));
else k=k||vech(cov(i));
end;
quit;
proc iml;
i=j(500,10);
call randseed(1);
k=j(55,5000);
do j=1 to 5000;
call randgen(i,"normal");
k[,j]=vech(cov(i));
end;
quit;``````

And the results.

``````1    proc iml;
2        i=j(500,10);
3        call randseed(1);
4        do j=1 to 5000;
5            call randgen(i,"normal");
6            if j=1 then k=vech(cov(i));
7            else k=k||vech(cov(i));
8        end;
9    quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time           4.17 seconds
cpu time            4.18 seconds

10   proc iml;
11       i=j(500,10);
12       call randseed(1);
13       k=j(55,5000);
14       do j=1 to 5000;
15           call randgen(i,"normal");
16           k[,j]=vech(cov(i));
17       end;
18   quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time           1.60 seconds
cpu time            1.60 seconds``````

The latter is about 60% faster than the former. Significant.

SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

Yes. The articles that I linked to in my initial reply have similar examples.

SAS Super FREQ

## Re: How to Better Vectorize Block Bootstrap?

This topic inspired me to write a few articles about various ways to perform block bootstraps in SAS:

From The DO Loop