BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

The raw time-series data are a 636×10 matrix as follows.

data have;
	infile "http://global-q.org/uploads/1/2/2/6/122679606/
portf_me_monthly_2019a.csv" url firstobs=2 dsd;
	input year month portfolio number return;
run;

proc transpose prefix=return out=have;
	by year month;
	id portfolio;
	var return;
run;

And each block bootstrap samples a 636×10 matrix

(1) starting at a random point between 1 and 636 and

(2) shifting to another random point with 90% probability or if the ending observation (i.e., if the index reaches 636) as follows.

proc iml;
	use have(keep=return:);
	read all var _all_ into a;
	t=nrow(a);
	n=ncol(a);
	call streaminit(1);
	do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
		do u=1 to t;
			if u=1 then v=rand("integer",1,t);
			else if v[u-1]=t then v=v//rand("integer",1,t);
			else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
			else v=v//v[u-1]+1;
		end;
		if b=1 then bootstrap=j(t,1)||a[v,];
		else bootstrap=bootstrap//(j(t,1,b)||a[v,]);
/*----------------------------------------------------------------------------*/
	end;
	create bootstrap from bootstrap[c="a"];
	append from bootstrap;
quit;

For example, the bootstrap picks the random starting point from 1 to 636, say 235, and continues with a 90% probability to 236, 237, 238, ..., 241, and 242 and shifts with a 10% probability to another random point, say 7, and then continues again to 8, 9, 10, etc. until it completes one bootstrap with 636 observations. So the average block length becomes 10.

The bottleneck problem is that these random time indices are therefore somewhat path-dependent and cannot be generated via RAND or RANDGEN. I wonder whether there is a more efficient way to cleverly vectorize this part.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Yes. Since you know that the final result will contain 636 rows, allocate pre-allocate arrays of that length. This avoids concatenating inside the loop, which is very inefficient.

See

"Pre-allocate arrays to improve efficiency"

and

"Friends don't let friends concatenate results inside a loop"

 

You also know that the final bootstrap array will have 2500*636 rows. However, if the only purpose of the outer loop (do b = 1 to 2500) is to get values that you will write to a file, you can save a LOT of time and space by writing each bootstrap sample DIRECTLY to a SAS data set instead of building up the 'bootstrap' array. See "Write to a SAS data set from inside a SAS/IML loop."

View solution in original post

9 REPLIES 9
Rick_SAS
SAS Super FREQ

Yes. Since you know that the final result will contain 636 rows, allocate pre-allocate arrays of that length. This avoids concatenating inside the loop, which is very inefficient.

See

"Pre-allocate arrays to improve efficiency"

and

"Friends don't let friends concatenate results inside a loop"

 

You also know that the final bootstrap array will have 2500*636 rows. However, if the only purpose of the outer loop (do b = 1 to 2500) is to get values that you will write to a file, you can save a LOT of time and space by writing each bootstrap sample DIRECTLY to a SAS data set instead of building up the 'bootstrap' array. See "Write to a SAS data set from inside a SAS/IML loop."

Rick_SAS
SAS Super FREQ

For example, I suspect this code will run much faster:

proc iml;
   /* Use Sashelp.Air for example */
	use Sashelp.Air;
	read all var 'Air' into a;
   close;
	t=nrow(a);
	call streaminit(1);

   /* allocate and tell output data set how many cols to expect */
   bootstrap = j(t,1,.) || a;
   create bootstrap from bootstrap[c={"SampleID" "a"}];

   v = j(t,1,.);
	do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
		do u=1 to t;
			if u=1 then v[u]=rand("integer",1,t);
			else if v[u-1]=t then v[u]=rand("integer",1,t);
			else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
			else v[u]=v[u-1]+1;
		end;
		bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
	   append from bootstrap;
	end;
   close;
quit;
Junyong
Pyrite | Level 9

Thanks for your advice, but can you check this out? This is your version.

proc iml;
   /* Use Sashelp.Air for example */
	use Sashelp.Air;
	read all var 'Air' into a;
   close;
	t=nrow(a);
	call streaminit(1);

   /* allocate and tell output data set how many cols to expect */
   bootstrap = j(t,1,.) || a;
   create bootstrap from bootstrap[c={"SampleID" "a"}];

   v = j(t,1,.);
	do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
		do u=1 to t;
			if u=1 then v[u]=rand("integer",1,t);
			else if v[u-1]=t then v[u]=rand("integer",1,t);
			else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
			else v[u]=v[u-1]+1;
		end;
		bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
	   append from bootstrap;
	end;
   close;
quit;

Here is the log.

1    proc iml;
NOTE: IML Ready
2       /* Use Sashelp.Air for example */
3        use Sashelp.Air;
4        read all var 'Air' into a;
5       close;
NOTE: Closing SASHELP.AIR
6        t=nrow(a);
7        call streaminit(1);
8
9       /* allocate and tell output data set how many cols to expect */
10      bootstrap = j(t,1,.) || a;
11      create bootstrap from bootstrap[c={"SampleID" "a"}];
12
13      v = j(t,1,.);
14       do b=1 to 2500;
15   /*----------------------------the bottleneck parts----------------------------*/
16           do u=1 to t;
17               if u=1 then v[u]=rand("integer",1,t);
18               else if v[u-1]=t then v[u]=rand("integer",1,t);
19               else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
20               else v[u]=v[u-1]+1;
21           end;
22           bootstrap = j(t,1,b) || a[v,];
23   /*----------------------------------------------------------------------------*/
24          append from bootstrap;
25       end;
26      close;
NOTE: Closing WORK.BOOTSTRAP
NOTE: The data set WORK.BOOTSTRAP has 360000 observations and 2 variables.
27   quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
      real time           1.18 seconds
      cpu time            1.17 seconds

And the following does not pre-allocate.

proc iml;
   /* Use Sashelp.Air for example */
	use Sashelp.Air;
	read all var 'Air' into a;
   close;
	t=nrow(a);
	call streaminit(1);

   /* allocate and tell output data set how many cols to expect */
   bootstrap = j(t,1,.) || a;
   create bootstrap from bootstrap[c={"SampleID" "a"}];

/*   v = j(t,1,.);*/
	do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
		do u=1 to t;
			if u=1 then v=rand("integer",1,t);
			else if v[u-1]=t then v=v//rand("integer",1,t);
			else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
			else v=v//v[u-1]+1;
		end;
		bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
	   append from bootstrap;
	end;
   close;
quit;

And the log.

28   proc iml;
NOTE: IML Ready
29      /* Use Sashelp.Air for example */
30       use Sashelp.Air;
31       read all var 'Air' into a;
32      close;
NOTE: Closing SASHELP.AIR
33       t=nrow(a);
34       call streaminit(1);
35
36      /* allocate and tell output data set how many cols to expect */
37      bootstrap = j(t,1,.) || a;
38      create bootstrap from bootstrap[c={"SampleID" "a"}];
39
40   /*   v = j(t,1,.);*/
41       do b=1 to 2500;
42   /*----------------------------the bottleneck parts----------------------------*/
43           do u=1 to t;
44               if u=1 then v=rand("integer",1,t);
45               else if v[u-1]=t then v=v//rand("integer",1,t);
46               else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
47               else v=v//v[u-1]+1;
48           end;
49           bootstrap = j(t,1,b) || a[v,];
50   /*----------------------------------------------------------------------------*/
51          append from bootstrap;
52       end;
53      close;
NOTE: Closing WORK.BOOTSTRAP
NOTE: The data set WORK.BOOTSTRAP has 360000 observations and 2 variables.
54   quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
      real time           1.17 seconds
      cpu time            1.17 seconds

It seems there is no performance difference—I also tried 25,000 rather than 2,500 but found no difference.

Rick_SAS
SAS Super FREQ

Right. The actual bottleneck in your program was the concatenation of the 'bootstrap' matrix within the outer loop. In your new test, both programs avoid that concatenation, which is why both programs run much faster than your first version.

 

Junyong
Pyrite | Level 9

So the takeaway is to use CREATE once and then APPEND repeatedly rather than // repeatedly then CREATE APPEND once—more than the pre-allocation issue (I will check this out in detail later). Thank you!

Rick_SAS
SAS Super FREQ

They are both important. For this example, appending within the loop made a bigger impact on performance.

Junyong
Pyrite | Level 9

It seems the case above isn't good enough to see the performance gain—the following substantially shows the gain instead. Thanks.

proc iml;
	i=j(500,10);
	call randseed(1);
	do j=1 to 5000;
		call randgen(i,"normal");
		if j=1 then k=vech(cov(i));
		else k=k||vech(cov(i));
	end;
quit;
proc iml;
	i=j(500,10);
	call randseed(1);
	k=j(55,5000);
	do j=1 to 5000;
		call randgen(i,"normal");
		k[,j]=vech(cov(i));
	end;
quit;

And the results.

1    proc iml;
NOTE: IML Ready
2        i=j(500,10);
3        call randseed(1);
4        do j=1 to 5000;
5            call randgen(i,"normal");
6            if j=1 then k=vech(cov(i));
7            else k=k||vech(cov(i));
8        end;
9    quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
      real time           4.17 seconds
      cpu time            4.18 seconds


10   proc iml;
NOTE: IML Ready
11       i=j(500,10);
12       call randseed(1);
13       k=j(55,5000);
14       do j=1 to 5000;
15           call randgen(i,"normal");
16           k[,j]=vech(cov(i));
17       end;
18   quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
      real time           1.60 seconds
      cpu time            1.60 seconds

The latter is about 60% faster than the former. Significant.

Rick_SAS
SAS Super FREQ

Yes. The articles that I linked to in my initial reply have similar examples.

Rick_SAS
SAS Super FREQ

This topic inspired me to write a few articles about various ways to perform block bootstraps in SAS:

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 9 replies
  • 853 views
  • 2 likes
  • 2 in conversation