Solved: Re: Easiest Way to Sum over Product on 2D Data

thepushkarsingh · Posted 01-29-2019 08:12 AM

I need to find the most efficient and fastest way to compute :

{sum over t} ({product over j} (X[j-1])*Y[t])

For a high enough j and t and with a couple of more variables in the equation this needs a lot of variables(columns) to be created which seem inefficient. I was thinking that I can define array with macro variables for X(j), Y(t) and such vars, but if it's fastest or is there a better way with PROC IML or any other trick. Any suggestions?

Thanks a lot in advance.

P.S.-X, Y are numerical variables between (0,1). Also 't' varies from 1 to n and 'j' varies from 1 to 't'.

Rick_SAS · Posted 01-30-2019 05:30 AM

Where are you getting those numbers? Those are not the results of the program I sent.

Anyway, to answer your question, it sounds like you want to use the matrix whose columns contain the cumulative probabilities.of 1 - p_Die.

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

/* form matrix of cumulative products of columns */
p_cumul = 1 - p_Die;
do j = 2 to ncol(p_cumul);
   p_cumul[,j] = p_cumul[,j] # p_cumul[,j-1]; 
end;

M = p_Race # p_cumul;
print M;

View solution in original post

PeterClemmensen · Posted 01-29-2019 10:34 AM

What does X and Y look like here?

The DATA to DATA Step Macro
Blog: SASnrd

thepushkarsingh · Posted 01-29-2019 11:03 AM

X, Y are numerical variables between (0,1). Also 't' varies from 1 to n and 'j' varies from 1 to 't'.

Rick_SAS · Posted 01-29-2019 11:12 AM

It would be best to provide example data and the result you expect.

When t=1, do you skip the product? If so, then t ranges from 2 to N.

Rick_SAS · Posted 01-29-2019 11:49 AM

I'm assuming that the outer summation will go from t=2..N. Otherwise, you can modify the code accordingly. The key to an efficient implementation is to recognize that the formula is separable and factors into a cumulative product and the powers of the elements of Y:

proc iml;
x = (1:4)`/5;
y = (2:5)`/5;

N = nrow(y);

/* naive loop: sum of product.
   For testing purposes only! */
s = 0;
do t = 2 to N;
   p = 1;
   do j = 2 to t;
      p = p* (x[j-1]#y[t]);
   end;
   s = s + p;
end;
print s;

/* efficient computation: problem factors into the product of X
   and sum of powers of Y */
xT = cuprod(x);              /* cumulative product */
yT = y##(0:N-1)`;           /* powers of Y */
v = xT[1:N-1] # yT[2:N]; /* elementwise product */
s2 = sum(v);
print s2;

thepushkarsingh · Posted 01-29-2019 12:12 PM

Thanks Rick. I have read so many papers of yours (still haven't learn). Thank you so much. I have one additional request though. Once CUPROD reaches end of a row, I want it to stop and start over from the next row instead of carrying the product from previous row. Any quick tip?

Rick_SAS · Posted 01-29-2019 12:38 PM

If you provide data and the expect results, I and other experts will think about it.

thepushkarsingh · Posted 01-29-2019 02:45 PM

Suppose a person can belong to any of the RACEs with the probability of PROB_RACE. And marginal probability of dying in years 1 to 3 is given(marginal survival will be 1-PROB_DYING):

RACE	PROB_RACE	PROB_DYING_YR1	PROB_DYING_YR2	PROB_DYING_YR3
A	0.26	0.1	0.2	0.3
B	0.35	0.16	0.25	0.45
C	0.23	0.23	0.34	0.17
D	0.16	0.18	0.17	0.14

So the person belonging to RACE A will have a probability of survival after 3 years, call it SURV3_A = (1-0.1)*(1-0.2)*(1-0.3).

Similarly, SURV3_B=(1-0.16)*(1-0.25)*(1-0.45) and so on...

And a random person will have a probability of survival after 3 years as : PROB_A*SURV3_A+PROB_B*SURV3_B+PROB_C*SURV3_C+PROB_D*SURV3_D.

I was thinking of a generalization to calculate probability of survival of any random person after 'T' years when there can be 'N' possible RACEs.

I can think of taking one row at a time,using CUPROD, then summing across columns which seem inefficient, so wondering if any easy way?

Rick_SAS · Posted 01-29-2019 03:17 PM

Aha! Now we see what you are trying to do! Much clearer and much simpler. I think the "trick" you are looking for is to use a subscript reduction operator for each row and across the columns:

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

p_surv = p_Race # (1 - p_Die)[, #];
print p_surv;

thepushkarsingh · Posted 01-30-2019 04:49 AM

Hi @Rick_SAS, sorry for being superxcited yesterday and not verifying everything. The trick you suggested gives only final column:

0.504

0.3465

0.421806

0.585316

Is there a way to get all the columns in the range, like:

0.9	0.72	0.504
0.84	0.63	0.3465
0.77	0.5082	0.421806
0.82	0.6806	0.585316

Thanks a lot in advance.

thepushkarsingh · Posted 01-29-2019 03:20 PM

Great thanks! Wish I could think like you! 🙂 Elegant!!

Rick_SAS · Posted 01-30-2019 05:35 AM

This same question is cross=posted.

Rick_SAS · Posted 01-30-2019 05:30 AM

Where are you getting those numbers? Those are not the results of the program I sent.

Anyway, to answer your question, it sounds like you want to use the matrix whose columns contain the cumulative probabilities.of 1 - p_Die.

proc iml;
p_Race = {0.26, 0.35, 0.23, 0.16};
p_Die = {
0.1	0.2	0.3,
0.16	0.25	0.45,
0.23	0.34	0.17,
0.18	0.17	0.14 };

/* form matrix of cumulative products of columns */
p_cumul = 1 - p_Die;
do j = 2 to ncol(p_cumul);
   p_cumul[,j] = p_cumul[,j] # p_cumul[,j-1]; 
end;

M = p_Race # p_cumul;
print M;

thepushkarsingh · Posted 01-30-2019 05:39 AM

Sorry, I only gave calcs for (1-X) part. Other parts were working properly, so I guessed if I get the solution to this one, it'll be solved. My apologies.

The 2025 SAS Hackathon has begun!