BookmarkSubscribeRSS Feed
Dima
Calcite | Level 5

Hello,

I have a matrix called DATA which contains informations on thousands of people of an insurance portfolio including their age.

I have also a mortality table (for each age from 18 to 110 it gives me the probability of death).

I split this table in 2 matrix (1 column, 92 rows): Age and Qx.

Now, I want to find for each people of DATA, the right mortality rate Qx.

Let's say that DATA_AGE is a column vector of all the ages of my portfolio.

POS=loc(Age=DATA_AGE);

Qx_Data=Qx[POS];

But this code doesn't work, it seems that LOC function cannot locate a vector of elements, but only one element.

So I have to write this code:

POS=J(nrow(DATA),1,0);

do i=1 to nrow(DATA_AGE);

     POS=loc(Age=DATA_AGE);

end;

Qx_Data=Qx[POS];

This code is very slow, so I wonder if there is a more powerful way to search for elements in IML?

Thank you.

Sample code:

proc iml;

/* Mortality rates */
Age={40,41,42,43,44,45,46,47,48,49,50};
Qx={0.01,0.011,0.012,0.013,0.014,0.015,0.016,0.017,0.018,0.019,0.020};

/* Data */
Data={50,43,41};

/* What I want to do */
POS=LOC(Age=Data);
Qx_Data=Qx[POS];

/* What I currently do */
POS=J(nrow(Data),1,0);
do i=1 to nrow(Data);
POS=loc(Age=Data);
end;
Qx_Data=Qx[POS];

print Qx_Data;

quit;

10 REPLIES 10
SteveDenham
Jade | Level 19

This seems like something that outside of IML would be pretty simple, using PROC SQL.

Steve Denham

Dima
Calcite | Level 5

Maybe PROC SQL would work better for this particular task, but my entire code runs in IML and I cannot use another language for the calculations I do.

Rick_SAS
SAS Super FREQ

Assuming that you have SAS/IML 9.3, use the ELEMENT function: SAS/IML(R) 12.1 User's Guide

pos = loc(element(age, data));

Qx_Data=Qx[POS];

Dima
Calcite | Level 5

Thank you Rick for your suggestion, at first I thought it would do the trick, but I have a problem.

I modified the code as you suggested:

proc iml;

/* Mortality rates */
Age={40,41,42,43,44,45,46,47,48,49,50};
Qx={0.01,0.011,0.012,0.013,0.014,0.015,0.016,0.017,0.018,0.019,0.020};

/* Data */
Data={50,43,41};

/* Search */

POS=loc(element(Age,Data));
Qx_Data=Qx[POS];

quit;

The result is:

POS={2 4 11};

Qx_Data={0.011, 0.013, 0.020};

The problem is that I want to know that:

- the person n°1, aged 50, has a Qx of 0.20

- the person n°2, aged 43, has a Qx of 0.013

- the person n°3, aged 41, has a Qx of 0.011

The element function which seems to work as an intersection of 2 matrix erase this information.

Am I correct?

Rick_SAS
SAS Super FREQ

The ELEMENT function provides you with an indicator variable (0/1) that indicates the elements of one vector that are contained in another. As such, it is a set operator, and sets are unordered.

If you need an ordered result, it is possible to reorder the result to match the order of your query:

Age_Data=Age[POS];

/* reorder rows to match the order of the query */
perm = j(nrow(Data),1);
do i = 1 to nrow(Data);
   perm = loc(Age_Data=Data);
end;
Qx_Data= Qx_Data[perm];
Age_Data= Age_Data[perm];

print Qx_Data Age_Data;

Notice that the DO loop is over the result, which is much smaller that the original data.  However, this code only works if you have one result for each query, which might not bet true in general. For example, the data might have more than one 50-year-old man).   If this is a problem, then you probably want to include a unique subject identifier in the data, and use that identifier in your queries.

Another option is to sort the elements of your query before you make the query. Then the result is already in the correct order.  For example, the data in your example is sorted by age. If you sort your "Data" vector by age, then the result comes out in the correct order.

Dima
Calcite | Level 5

This would lead back to my initial code which was:

POS=J(nrow(DATA),1,0);

do i=1 to nrow(DATA_AGE);

     POS=loc(Age=DATA_AGE);

end;

Qx_Data=Qx[POS];

And the DO loop runs from 1 to nrow(Data) which in reality contains around 50000 rows...

I can't sort my data because I will have to do this for many variables and I would not be able to reattribute the results to the right row of initial Data.

Rick_SAS
SAS Super FREQ

No. In my DO loop, Age_Data vector is short (3 elements in this example).  In yours, Age is long (11 elements in this example, but you say 50000 elements in real life).

Dima
Calcite | Level 5

Your DO loop runs from 1 to nrow(Data). Data has 3 rows in the sample, but 50000 in real life.

So your DO loop would be as long as mine.

Am I wrong? I may have missed something.

Rick_SAS
SAS Super FREQ

It sounds like I am the one that doesn't understand the structure of your data. Sorry for that.

At the risk of repeating myself, I would use CALL SORT to the sort the DATA vector and then use the LOC and ELEMENT functions to find the subset of your data that is of interest. If necessary, include a unique ID variable (for example, the observation number) to help identify the subjects.  If that doesn't help you to solve your problem, then maybe someone who understands your data better can chime in.  Good luck.

Dima
Calcite | Level 5

Thank you for your help Rick, I will try to include a unique ID and apply the SORT/LOC/ELEMENT procedure.

It will take longer to implement in my real code.

I will post later my feedback.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 10 replies
  • 1588 views
  • 9 likes
  • 3 in conversation