BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I need to find the most efficient and fastest way to compute :

{sum over t} ({product over j} (X[j-1])*Y[t])

For a high enough j and t and with a couple of more variables in the equation this needs a lot of variables(columns) to be created which seem inefficient. I was thinking that I can define array with macro variables for X(j), Y(t) and such vars, but if it's fastest or is there a better way with PROC IML or any other trick. Any suggestions?

Thanks a lot in advance.

 

P.S.-X, Y are numerical variables between (0,1). Also 't' varies from 1 to n and 'j' varies from 1 to 't'.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Where are you getting those numbers? Those are not the results of the program I sent.

 

Anyway, to answer your question, it sounds like you want to use the matrix whose columns contain the cumulative probabilities.of 1 - p_Die. 

 

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

/* form matrix of cumulative products of columns */
p_cumul = 1 - p_Die;
do j = 2 to ncol(p_cumul);
   p_cumul[,j] = p_cumul[,j] # p_cumul[,j-1]; 
end;

M = p_Race # p_cumul;
print M;

 

 

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

What does X and Y look like here? 

thepushkarsingh
Quartz | Level 8
X, Y are numerical variables between (0,1). Also 't' varies from 1 to n and 'j' varies from 1 to 't'.
Rick_SAS
SAS Super FREQ

It would be best to provide example data and the result you expect.

 

When t=1, do you skip the product?  If so, then t ranges from 2 to N.

Rick_SAS
SAS Super FREQ

I'm assuming that the outer summation will go from t=2..N. Otherwise, you can modify the code accordingly. The key to an efficient implementation is to recognize that the formula is separable and factors into a cumulative product and the powers of the elements of Y:

 

proc iml;
x = (1:4)`/5;
y = (2:5)`/5;

N = nrow(y);

/* naive loop: sum of product.
   For testing purposes only! */
s = 0;
do t = 2 to N;
   p = 1;
   do j = 2 to t;
      p = p* (x[j-1]#y[t]);
   end;
   s = s + p;
end;
print s;

/* efficient computation: problem factors into the product of X
   and sum of powers of Y */
xT = cuprod(x);              /* cumulative product */
yT = y##(0:N-1)`;           /* powers of Y */
v = xT[1:N-1] # yT[2:N]; /* elementwise product */
s2 = sum(v);
print s2;


thepushkarsingh
Quartz | Level 8
Thanks Rick. I have read so many papers of yours (still haven't learn). Thank you so much. I have one additional request though. Once CUPROD reaches end of a row, I want it to stop and start over from the next row instead of carrying the product from previous row. Any quick tip?
Rick_SAS
SAS Super FREQ

If you provide data and the expect results, I and other experts will think about it.

thepushkarsingh
Quartz | Level 8

Suppose a person can belong to any of the RACEs with the probability of PROB_RACE. And marginal probability of dying in years 1 to 3 is given(marginal survival will be 1-PROB_DYING):

RACEPROB_RACEPROB_DYING_YR1PROB_DYING_YR2PROB_DYING_YR3
A0.260.10.20.3
B0.350.160.250.45
C0.230.230.340.17
D0.160.180.170.14

 

So the person belonging to RACE A will have a probability of survival after 3 years, call it SURV3_A = (1-0.1)*(1-0.2)*(1-0.3).

Similarly, SURV3_B=(1-0.16)*(1-0.25)*(1-0.45) and so on...

And a random person will have a probability of survival after 3 years as : PROB_A*SURV3_A+PROB_B*SURV3_B+PROB_C*SURV3_C+PROB_D*SURV3_D.

I was thinking of a generalization to calculate probability of survival of any random person after 'T' years when there can be 'N' possible RACEs.

 

I can think of taking one row at a time,using CUPROD, then summing across columns which seem inefficient, so wondering if any easy way?

Rick_SAS
SAS Super FREQ

Aha! Now we see what you are trying to do! Much clearer and much simpler. I think the "trick" you are looking for is to use a subscript reduction operator for each row and across the columns:

 

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

p_surv = p_Race # (1 - p_Die)[, #];
print p_surv;

 

 

 

 

thepushkarsingh
Quartz | Level 8

Hi @Rick_SAS, sorry for being superxcited yesterday and not verifying everything. The trick you suggested gives only final column:

0.504
0.3465
0.421806
0.585316

Is there a way to get all the columns in the range, like:

 

0.90.720.504
0.840.630.3465
0.770.50820.421806
0.820.68060.585316

Thanks a lot in advance.

thepushkarsingh
Quartz | Level 8
Great thanks! Wish I could think like you! 🙂 Elegant!!
Rick_SAS
SAS Super FREQ

Where are you getting those numbers? Those are not the results of the program I sent.

 

Anyway, to answer your question, it sounds like you want to use the matrix whose columns contain the cumulative probabilities.of 1 - p_Die. 

 

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

/* form matrix of cumulative products of columns */
p_cumul = 1 - p_Die;
do j = 2 to ncol(p_cumul);
   p_cumul[,j] = p_cumul[,j] # p_cumul[,j-1]; 
end;

M = p_Race # p_cumul;
print M;

 

 

thepushkarsingh
Quartz | Level 8

Sorry, I only gave calcs for (1-X) part. Other parts were working properly, so I guessed if I get the solution to this one, it'll be solved. My apologies.