Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-03-2020 09:49 PM
(551 views)

Hi all!

I have a generational table represented by a matrix where each row represents the initial ages of two individuals (X and J) and each column represents a annuity factor that is calendar-year specific. Both ages as well the projection year can vary between 0 and 125 which results in a matrix M of dimension 15876 x 126.

Now lets say that the initial ages for X and J are zero (on row 1 of M), in the following year they would be one year older (x=1 and j=1 located at row x of M) and so on and so forth until both X and J reach the max age (125) - which would then result in a square matrix N of 126x126. My main goal is for retrieve each diagonal of N and performs some calculations and export the results for later use.

I was able to do this using the following steps:

1) imported my gen table into a matrix called 'inputDjxx' using proc iml;

2) Subset the above matrix by using a few loops in conjuction with the loc function;

3) Another loop to get each diagonal and do some additional calculations.

I am trying to optimize the following code but i am stuck, is there a better way to do this without so many loops? Please, see attached file for reference and the code below:

```
PROC IML;
use work.gen_table;
read all var _all_ into inputDjxx;
close work.gen_table;
do age_x=0 to &MaxAge; *Initial age for X;
do age_j=0 to &MaxAge; *Initial age for J;
do t=0 to &MaxAge;
new_x=min(&MaxAge, age_x+t);
new_j=min(&MaxAge, age_j+t);
*Identify lines for corresponding to ages x and j;
idx = loc(inputDjxx[,3]=new_x & inputDjxx[,4]=new_j);
if t=0 then
idxj=idx;
else
idxj=insert(idxj,idx,nrow(idxj)+1);
end;
*Submatrix of dimension 126x126;
Djxx=inputDjxx[idxj, 3:128];
lastrow=nrow(Djxx);
lastcol=ncol(Djxx);
*Matrix to hold final results;
Njxx=j(lastrow, lastcol,0);
*Get each diagonal of the square matrix and performs some calculations;
do rowx= 1 to lastrow;
Djxx_subset = Djxx[rowx:lastrow,1:lastcol];
Djxx_diag = vecdiag(Djxx_subset);
idx=0:(nrow(Djxx_diag)-1);
DiagOffset=lag(Djxx_diag, -idx);
DiagOffsetSum=DiagOffset[+,];
free Djxx_diag DiagOffset;
if rowx=1 then do;
Njxx = DiagOffsetSum;
end;
else do;
IsMiss = j(1,1,.);
DiagOffsetSum = DiagOffsetSum || repeat(IsMiss,1,rowx-1);
Njxx=insert(Njxx, DiagOffsetSum,nrow(Njxx)+1);
end;
end;
*print or create table from resulting matrix;
end;
end;
end;
QUIT;
```

Thanks in advance/

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have a few suggestions that should help you improve and optimize your code.

I am concerned that the matrix Njxx, which you say should hold the final results, is being set to a zero matrix within the third nested loop. By continually overwriting Njxx, are you not throwing away most of the results, and only collecting information from the very last iteration of the inner most loop, when age_x, age_j and t are all equal to &MaxAge?

The ELSE block near the end of the code is attempting to 'grow' the matrix Njxx by appending a row at the bottom for every iteration. This is very inefficient as it creates a new matrix in a different area of memory every time. Because of the unnecessary copying syntax like this should be avoided. Ideally you want to create Njxx at its final size at the start of the program (full of missing values) and then write each row of results using something like:

`Njxx[ i, 1:ncol(DiagOffsetSum) ] = DiagOffsetSum;`

where 'i' is the next available row to hold results.

I think the calculation in the innermost loop can be improved as there is no need to create the maxtrix DiagOffset with all the different lags. Something simpler based on the CUSUM function like

`DiagOffsetSum = sum( Djxx_diag ) - (0 || t( cusum( Djxx_diag )));`

will suffice and give the sums that you want. Note that there will be a redundant zero at the end of the vector DiagOffsetSum, but it is easy to leave this out when you copy the results to Njxx.

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have a few suggestions that should help you improve and optimize your code.

I am concerned that the matrix Njxx, which you say should hold the final results, is being set to a zero matrix within the third nested loop. By continually overwriting Njxx, are you not throwing away most of the results, and only collecting information from the very last iteration of the inner most loop, when age_x, age_j and t are all equal to &MaxAge?

The ELSE block near the end of the code is attempting to 'grow' the matrix Njxx by appending a row at the bottom for every iteration. This is very inefficient as it creates a new matrix in a different area of memory every time. Because of the unnecessary copying syntax like this should be avoided. Ideally you want to create Njxx at its final size at the start of the program (full of missing values) and then write each row of results using something like:

`Njxx[ i, 1:ncol(DiagOffsetSum) ] = DiagOffsetSum;`

where 'i' is the next available row to hold results.

I think the calculation in the innermost loop can be improved as there is no need to create the maxtrix DiagOffset with all the different lags. Something simpler based on the CUSUM function like

`DiagOffsetSum = sum( Djxx_diag ) - (0 || t( cusum( Djxx_diag )));`

will suffice and give the sums that you want. Note that there will be a redundant zero at the end of the vector DiagOffsetSum, but it is easy to leave this out when you copy the results to Njxx.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Ian! Followed your suggestions and without the lag function it seems a bit faster now.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your reply, it is good news that things are getting faster. I was just looking again at the code and noticed that the first END statement is not ending the ELSE block of the IF (as the indentation suggests) but ending the loop over 't'. In other words the results are always for t=MaxAge.

When age_x and age_j are both equal to MaxAge, I believe that Djxx will contain upto 126 copies of the same 126x126 matrix all stacked on top of each other.

As Rick suggests it may be worthwhile checking that things work as intended for a few specific values. Use a tiny data set and print out all the intermediate workings.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Whereas Ian provided suggestions that are specific to this program, here are some general suggestions:

1. Start optimizing/rewriting the inner-most body of the program. For your program, you might want to set values for age_x, age_j, and t and then focus on the remainder of the program. For example:

age_x=20; *Initial age for X;

age_j=30; *Initial age for J;

t=5;

Make sure the rest of the code works as expected for those (and other) values.

2. If you think the code can be vectorized for efficiency, again start at the innermost loop. See the tips at "How to vectorize computations in a matrix language."

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

I used IanWakeling sugestions and the code seems to be a little faster now but i will also try to use vectorization computation instead. I must confess i am not used to SAS IML and have to think on how to apply this in my code.

Just a quick note: your blog helped me a lot in my current assignment. Thanks for all the tips!

I used IanWakeling sugestions and the code seems to be a little faster now but i will also try to use vectorization computation instead. I must confess i am not used to SAS IML and have to think on how to apply this in my code.

Just a quick note: your blog helped me a lot in my current assignment. Thanks for all the tips!

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.