Re: Search for elements as with Excel Vlookup

Dima · Posted 03-27-2013 06:31 AM

Hello,

I have a matrix called DATA which contains informations on thousands of people of an insurance portfolio including their age.

I have also a mortality table (for each age from 18 to 110 it gives me the probability of death).

I split this table in 2 matrix (1 column, 92 rows): Age and Qx.

Now, I want to find for each people of DATA, the right mortality rate Qx.

Let's say that DATA_AGE is a column vector of all the ages of my portfolio.

POS=loc(Age=DATA_AGE);

Qx_Data=Qx[POS];

But this code doesn't work, it seems that LOC function cannot locate a vector of elements, but only one element.

So I have to write this code:

POS=J(nrow(DATA),1,0);

do i=1 to nrow(DATA_AGE);

POS=loc(Age=DATA_AGE);

end;

Qx_Data=Qx[POS];

This code is very slow, so I wonder if there is a more powerful way to search for elements in IML?

Thank you.

Sample code:

proc iml;

/* Mortality rates */
Age={40,41,42,43,44,45,46,47,48,49,50};
Qx={0.01,0.011,0.012,0.013,0.014,0.015,0.016,0.017,0.018,0.019,0.020};

/* Data */
Data={50,43,41};

/* What I want to do */
POS=LOC(Age=Data);
Qx_Data=Qx[POS];

/* What I currently do */
POS=J(nrow(Data),1,0);
do i=1 to nrow(Data);
POS=loc(Age=Data);
end;
Qx_Data=Qx[POS];

print Qx_Data;

quit;

SteveDenham · Posted 03-27-2013 08:14 AM

This seems like something that outside of IML would be pretty simple, using PROC SQL.

Steve Denham

Dima · Posted 03-27-2013 09:41 AM

Maybe PROC SQL would work better for this particular task, but my entire code runs in IML and I cannot use another language for the calculations I do.

Rick_SAS · Posted 03-27-2013 08:47 AM

Assuming that you have SAS/IML 9.3, use the ELEMENT function: SAS/IML(R) 12.1 User's Guide

pos = loc(element(age, data));

Qx_Data=Qx[POS];

Dima · Posted 03-27-2013 09:55 AM

Thank you Rick for your suggestion, at first I thought it would do the trick, but I have a problem.

I modified the code as you suggested:

proc iml;

/* Mortality rates */
Age={40,41,42,43,44,45,46,47,48,49,50};
Qx={0.01,0.011,0.012,0.013,0.014,0.015,0.016,0.017,0.018,0.019,0.020};

/* Data */
Data={50,43,41};

/* Search */

POS=loc(element(Age,Data));
Qx_Data=Qx[POS];

quit;

The result is:

POS={2 4 11};

Qx_Data={0.011, 0.013, 0.020};

The problem is that I want to know that:

- the person n°1, aged 50, has a Qx of 0.20

- the person n°2, aged 43, has a Qx of 0.013

- the person n°3, aged 41, has a Qx of 0.011

The element function which seems to work as an intersection of 2 matrix erase this information.

Am I correct?

Rick_SAS · Posted 03-27-2013 10:25 AM

The ELEMENT function provides you with an indicator variable (0/1) that indicates the elements of one vector that are contained in another. As such, it is a set operator, and sets are unordered.

If you need an ordered result, it is possible to reorder the result to match the order of your query:

Age_Data=Age[POS];

/* reorder rows to match the order of the query */
perm = j(nrow(Data),1);
do i = 1 to nrow(Data);
perm = loc(Age_Data=Data);
end;
Qx_Data= Qx_Data[perm];
Age_Data= Age_Data[perm];

print Qx_Data Age_Data;

Notice that the DO loop is over the result, which is much smaller that the original data. However, this code only works if you have one result for each query, which might not bet true in general. For example, the data might have more than one 50-year-old man). If this is a problem, then you probably want to include a unique subject identifier in the data, and use that identifier in your queries.

Another option is to sort the elements of your query before you make the query. Then the result is already in the correct order. For example, the data in your example is sorted by age. If you sort your "Data" vector by age, then the result comes out in the correct order.

Dima · Posted 03-27-2013 10:48 AM

This would lead back to my initial code which was:

POS=J(nrow(DATA),1,0);

do i=1 to nrow(DATA_AGE);

POS=loc(Age=DATA_AGE);

end;

Qx_Data=Qx[POS];

And the DO loop runs from 1 to nrow(Data) which in reality contains around 50000 rows...

I can't sort my data because I will have to do this for many variables and I would not be able to reattribute the results to the right row of initial Data.

Rick_SAS · Posted 03-27-2013 10:54 AM

No. In my DO loop, Age_Data vector is short (3 elements in this example). In yours, Age is long (11 elements in this example, but you say 50000 elements in real life).

Dima · Posted 03-27-2013 11:02 AM

Your DO loop runs from 1 to nrow(Data). Data has 3 rows in the sample, but 50000 in real life.

So your DO loop would be as long as mine.

Am I wrong? I may have missed something.

Rick_SAS · Posted 03-27-2013 11:12 AM

It sounds like I am the one that doesn't understand the structure of your data. Sorry for that.

At the risk of repeating myself, I would use CALL SORT to the sort the DATA vector and then use the LOC and ELEMENT functions to find the subset of your data that is of interest. If necessary, include a unique ID variable (for example, the observation number) to help identify the subjects. If that doesn't help you to solve your problem, then maybe someone who understands your data better can chime in. Good luck.

Dima · Posted 03-27-2013 11:54 AM

Thank you for your help Rick, I will try to include a unique ID and apply the SORT/LOC/ELEMENT procedure.

It will take longer to implement in my real code.

I will post later my feedback.

Registration is open