Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-22-2020 03:21 AM
(516 views)

The raw time-series data are a 636×10 matrix as follows.

```
data have;
infile "http://global-q.org/uploads/1/2/2/6/122679606/
portf_me_monthly_2019a.csv" url firstobs=2 dsd;
input year month portfolio number return;
run;
proc transpose prefix=return out=have;
by year month;
id portfolio;
var return;
run;
```

And each block bootstrap samples a 636×10 matrix

(1) starting at a random point between 1 and 636 and

(2) shifting to another random point with 90% probability or if the ending observation (i.e., if the index reaches 636) as follows.

```
proc iml;
use have(keep=return:);
read all var _all_ into a;
t=nrow(a);
n=ncol(a);
call streaminit(1);
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v=rand("integer",1,t);
else if v[u-1]=t then v=v//rand("integer",1,t);
else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
else v=v//v[u-1]+1;
end;
if b=1 then bootstrap=j(t,1)||a[v,];
else bootstrap=bootstrap//(j(t,1,b)||a[v,]);
/*----------------------------------------------------------------------------*/
end;
create bootstrap from bootstrap[c="a"];
append from bootstrap;
quit;
```

For example, the bootstrap picks the random starting point from 1 to 636, say 235, and continues with a 90% probability to 236, 237, 238, ..., 241, and 242 and shifts with a 10% probability to another random point, say 7, and then continues again to 8, 9, 10, etc. until it completes one bootstrap with 636 observations. So the average block length becomes 10.

The bottleneck problem is that these random time indices are therefore somewhat path-dependent and cannot be generated via RAND or RANDGEN. I wonder whether there is a more efficient way to cleverly vectorize this part.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes. Since you know that the final result will contain 636 rows, allocate pre-allocate arrays of that length. This avoids concatenating inside the loop, which is very inefficient.

See

"Pre-allocate arrays to improve efficiency"

and

"Friends don't let friends concatenate results inside a loop"

You also know that the final bootstrap array will have 2500*636 rows. However, if the only purpose of the outer loop (do b = 1 to 2500) is to get values that you will write to a file, you can save a LOT of time and space by writing each bootstrap sample DIRECTLY to a SAS data set instead of building up the 'bootstrap' array. See "Write to a SAS data set from inside a SAS/IML loop."

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes. Since you know that the final result will contain 636 rows, allocate pre-allocate arrays of that length. This avoids concatenating inside the loop, which is very inefficient.

See

"Pre-allocate arrays to improve efficiency"

and

"Friends don't let friends concatenate results inside a loop"

You also know that the final bootstrap array will have 2500*636 rows. However, if the only purpose of the outer loop (do b = 1 to 2500) is to get values that you will write to a file, you can save a LOT of time and space by writing each bootstrap sample DIRECTLY to a SAS data set instead of building up the 'bootstrap' array. See "Write to a SAS data set from inside a SAS/IML loop."

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For example, I suspect this code will run much faster:

```
proc iml;
/* Use Sashelp.Air for example */
use Sashelp.Air;
read all var 'Air' into a;
close;
t=nrow(a);
call streaminit(1);
/* allocate and tell output data set how many cols to expect */
bootstrap = j(t,1,.) || a;
create bootstrap from bootstrap[c={"SampleID" "a"}];
v = j(t,1,.);
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v[u]=rand("integer",1,t);
else if v[u-1]=t then v[u]=rand("integer",1,t);
else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
else v[u]=v[u-1]+1;
end;
bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
append from bootstrap;
end;
close;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your advice, but can you check this out? This is your version.

```
proc iml;
/* Use Sashelp.Air for example */
use Sashelp.Air;
read all var 'Air' into a;
close;
t=nrow(a);
call streaminit(1);
/* allocate and tell output data set how many cols to expect */
bootstrap = j(t,1,.) || a;
create bootstrap from bootstrap[c={"SampleID" "a"}];
v = j(t,1,.);
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v[u]=rand("integer",1,t);
else if v[u-1]=t then v[u]=rand("integer",1,t);
else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
else v[u]=v[u-1]+1;
end;
bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
append from bootstrap;
end;
close;
quit;
```

Here is the log.

```
1 proc iml;
NOTE: IML Ready
2 /* Use Sashelp.Air for example */
3 use Sashelp.Air;
4 read all var 'Air' into a;
5 close;
NOTE: Closing SASHELP.AIR
6 t=nrow(a);
7 call streaminit(1);
8
9 /* allocate and tell output data set how many cols to expect */
10 bootstrap = j(t,1,.) || a;
11 create bootstrap from bootstrap[c={"SampleID" "a"}];
12
13 v = j(t,1,.);
14 do b=1 to 2500;
15 /*----------------------------the bottleneck parts----------------------------*/
16 do u=1 to t;
17 if u=1 then v[u]=rand("integer",1,t);
18 else if v[u-1]=t then v[u]=rand("integer",1,t);
19 else if rand("uniform")>0.9 then v[u]=rand("integer",1,t);
20 else v[u]=v[u-1]+1;
21 end;
22 bootstrap = j(t,1,b) || a[v,];
23 /*----------------------------------------------------------------------------*/
24 append from bootstrap;
25 end;
26 close;
NOTE: Closing WORK.BOOTSTRAP
NOTE: The data set WORK.BOOTSTRAP has 360000 observations and 2 variables.
27 quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 1.18 seconds
cpu time 1.17 seconds
```

And the following does not pre-allocate.

```
proc iml;
/* Use Sashelp.Air for example */
use Sashelp.Air;
read all var 'Air' into a;
close;
t=nrow(a);
call streaminit(1);
/* allocate and tell output data set how many cols to expect */
bootstrap = j(t,1,.) || a;
create bootstrap from bootstrap[c={"SampleID" "a"}];
/* v = j(t,1,.);*/
do b=1 to 2500;
/*----------------------------the bottleneck parts----------------------------*/
do u=1 to t;
if u=1 then v=rand("integer",1,t);
else if v[u-1]=t then v=v//rand("integer",1,t);
else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
else v=v//v[u-1]+1;
end;
bootstrap = j(t,1,b) || a[v,];
/*----------------------------------------------------------------------------*/
append from bootstrap;
end;
close;
quit;
```

And the log.

```
28 proc iml;
NOTE: IML Ready
29 /* Use Sashelp.Air for example */
30 use Sashelp.Air;
31 read all var 'Air' into a;
32 close;
NOTE: Closing SASHELP.AIR
33 t=nrow(a);
34 call streaminit(1);
35
36 /* allocate and tell output data set how many cols to expect */
37 bootstrap = j(t,1,.) || a;
38 create bootstrap from bootstrap[c={"SampleID" "a"}];
39
40 /* v = j(t,1,.);*/
41 do b=1 to 2500;
42 /*----------------------------the bottleneck parts----------------------------*/
43 do u=1 to t;
44 if u=1 then v=rand("integer",1,t);
45 else if v[u-1]=t then v=v//rand("integer",1,t);
46 else if rand("uniform")>0.9 then v=v//rand("integer",1,t);
47 else v=v//v[u-1]+1;
48 end;
49 bootstrap = j(t,1,b) || a[v,];
50 /*----------------------------------------------------------------------------*/
51 append from bootstrap;
52 end;
53 close;
NOTE: Closing WORK.BOOTSTRAP
NOTE: The data set WORK.BOOTSTRAP has 360000 observations and 2 variables.
54 quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 1.17 seconds
cpu time 1.17 seconds
```

It seems there is no performance difference—I also tried 25,000 rather than 2,500 but found no difference.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It seems the case above isn't good enough to see the performance gain—the following substantially shows the gain instead. Thanks.

```
proc iml;
i=j(500,10);
call randseed(1);
do j=1 to 5000;
call randgen(i,"normal");
if j=1 then k=vech(cov(i));
else k=k||vech(cov(i));
end;
quit;
proc iml;
i=j(500,10);
call randseed(1);
k=j(55,5000);
do j=1 to 5000;
call randgen(i,"normal");
k[,j]=vech(cov(i));
end;
quit;
```

And the results.

```
1 proc iml;
NOTE: IML Ready
2 i=j(500,10);
3 call randseed(1);
4 do j=1 to 5000;
5 call randgen(i,"normal");
6 if j=1 then k=vech(cov(i));
7 else k=k||vech(cov(i));
8 end;
9 quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 4.17 seconds
cpu time 4.18 seconds
10 proc iml;
NOTE: IML Ready
11 i=j(500,10);
12 call randseed(1);
13 k=j(55,5000);
14 do j=1 to 5000;
15 call randgen(i,"normal");
16 k[,j]=vech(cov(i));
17 end;
18 quit;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 1.60 seconds
cpu time 1.60 seconds
```

The latter is about 60% faster than the former. Significant.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes. The articles that I linked to in my initial reply have similar examples.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.