BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I need to find the most efficient and fastest way to compute :

{sum over t} ({product over j} (X[j-1])*Y[t])

For a high enough j and t and with a couple of more variables in the equation this needs a lot of variables(columns) to be created which seem inefficient. I was thinking that I can define array with macro variables for X(j), Y(t) and such vars, but if it's fastest or is there a better way with PROC IML or any other trick. Any suggestions?

Thanks a lot in advance.

 

P.S.-X, Y are numerical variables between (0,1). Also 't' varies from 1 to n and 'j' varies from 1 to 't'.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Where are you getting those numbers? Those are not the results of the program I sent.

 

Anyway, to answer your question, it sounds like you want to use the matrix whose columns contain the cumulative probabilities.of 1 - p_Die. 

 

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

/* form matrix of cumulative products of columns */
p_cumul = 1 - p_Die;
do j = 2 to ncol(p_cumul);
   p_cumul[,j] = p_cumul[,j] # p_cumul[,j-1]; 
end;

M = p_Race # p_cumul;
print M;

 

 

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

What does X and Y look like here? 

thepushkarsingh
Quartz | Level 8
X, Y are numerical variables between (0,1). Also 't' varies from 1 to n and 'j' varies from 1 to 't'.
Rick_SAS
SAS Super FREQ

It would be best to provide example data and the result you expect.

 

When t=1, do you skip the product?  If so, then t ranges from 2 to N.

Rick_SAS
SAS Super FREQ

I'm assuming that the outer summation will go from t=2..N. Otherwise, you can modify the code accordingly. The key to an efficient implementation is to recognize that the formula is separable and factors into a cumulative product and the powers of the elements of Y:

 

proc iml;
x = (1:4)`/5;
y = (2:5)`/5;

N = nrow(y);

/* naive loop: sum of product.
   For testing purposes only! */
s = 0;
do t = 2 to N;
   p = 1;
   do j = 2 to t;
      p = p* (x[j-1]#y[t]);
   end;
   s = s + p;
end;
print s;

/* efficient computation: problem factors into the product of X
   and sum of powers of Y */
xT = cuprod(x);              /* cumulative product */
yT = y##(0:N-1)`;           /* powers of Y */
v = xT[1:N-1] # yT[2:N]; /* elementwise product */
s2 = sum(v);
print s2;


thepushkarsingh
Quartz | Level 8
Thanks Rick. I have read so many papers of yours (still haven't learn). Thank you so much. I have one additional request though. Once CUPROD reaches end of a row, I want it to stop and start over from the next row instead of carrying the product from previous row. Any quick tip?
Rick_SAS
SAS Super FREQ

If you provide data and the expect results, I and other experts will think about it.

thepushkarsingh
Quartz | Level 8

Suppose a person can belong to any of the RACEs with the probability of PROB_RACE. And marginal probability of dying in years 1 to 3 is given(marginal survival will be 1-PROB_DYING):

RACEPROB_RACEPROB_DYING_YR1PROB_DYING_YR2PROB_DYING_YR3
A0.260.10.20.3
B0.350.160.250.45
C0.230.230.340.17
D0.160.180.170.14

 

So the person belonging to RACE A will have a probability of survival after 3 years, call it SURV3_A = (1-0.1)*(1-0.2)*(1-0.3).

Similarly, SURV3_B=(1-0.16)*(1-0.25)*(1-0.45) and so on...

And a random person will have a probability of survival after 3 years as : PROB_A*SURV3_A+PROB_B*SURV3_B+PROB_C*SURV3_C+PROB_D*SURV3_D.

I was thinking of a generalization to calculate probability of survival of any random person after 'T' years when there can be 'N' possible RACEs.

 

I can think of taking one row at a time,using CUPROD, then summing across columns which seem inefficient, so wondering if any easy way?

Rick_SAS
SAS Super FREQ

Aha! Now we see what you are trying to do! Much clearer and much simpler. I think the "trick" you are looking for is to use a subscript reduction operator for each row and across the columns:

 

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

p_surv = p_Race # (1 - p_Die)[, #];
print p_surv;

 

 

 

 

thepushkarsingh
Quartz | Level 8

Hi @Rick_SAS, sorry for being superxcited yesterday and not verifying everything. The trick you suggested gives only final column:

0.504
0.3465
0.421806
0.585316

Is there a way to get all the columns in the range, like:

 

0.90.720.504
0.840.630.3465
0.770.50820.421806
0.820.68060.585316

Thanks a lot in advance.

thepushkarsingh
Quartz | Level 8
Great thanks! Wish I could think like you! 🙂 Elegant!!
Rick_SAS
SAS Super FREQ

Where are you getting those numbers? Those are not the results of the program I sent.

 

Anyway, to answer your question, it sounds like you want to use the matrix whose columns contain the cumulative probabilities.of 1 - p_Die. 

 

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

/* form matrix of cumulative products of columns */
p_cumul = 1 - p_Die;
do j = 2 to ncol(p_cumul);
   p_cumul[,j] = p_cumul[,j] # p_cumul[,j-1]; 
end;

M = p_Race # p_cumul;
print M;

 

 

thepushkarsingh
Quartz | Level 8

Sorry, I only gave calcs for (1-X) part. Other parts were working properly, so I guessed if I get the solution to this one, it'll be solved. My apologies.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 13 replies
  • 1915 views
  • 7 likes
  • 3 in conversation