Quartz | Level 8

Efficient Cumulative Sum over Cumulative Product

Suppose a person can belong to any of the RACEs with the probability of PROB_RACE. And marginal probability of dying in years 1 to 3 is given(marginal survival will be 1-PROB_DYING):

 RACE PROB_RACE PROB_DYING_YR1 PROB_DYING_YR2 PROB_DYING_YR3 A 0.26 0.1 0.2 0.3 B 0.35 0.16 0.25 0.45 C 0.23 0.23 0.34 0.17 D 0.16 0.18 0.17 0.14

So the person belonging to RACE A will have a probability of survival after 3 years, call it SURV3_A = (1-0.1)*(1-0.2)*(1-0.3).

Similarly, SURV3_B=(1-0.16)*(1-0.25)*(1-0.45) and so on...

And a random person will have a probability of survival after 3 years as : PROB_A*SURV3_A+PROB_B*SURV3_B+PROB_C*SURV3_C+PROB_D*SURV3_D.

I was thinking of a generalization to calculate probability of survival of any random person after 'T' years when there can be 'N' possible RACEs.

I can think of taking one row at a time,using CUPROD, then summing across columns which seem inefficient, so wondering if any easy way?

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

Re: Efficient Cumulative Sum over Cumulative Product

Just want to add that it is considered bad etiquette to post the same question twice and solicit advice in two separate threads.

8 REPLIES 8
Super User

Re: Efficient Cumulative Sum over Cumulative Product

Implementing your base formula is trivial in a data step. Can you show what you expect as the final, final output, assuming this is the input? I want to ensure there's nothing I'm missing before taking a further look or providing any code.

@thepushkarsingh wrote:

Suppose a person can belong to any of the RACEs with the probability of PROB_RACE. And marginal probability of dying in years 1 to 3 is given(marginal survival will be 1-PROB_DYING):

 RACE PROB_RACE PROB_DYING_YR1 PROB_DYING_YR2 PROB_DYING_YR3 A 0.26 0.1 0.2 0.3 B 0.35 0.16 0.25 0.45 C 0.23 0.23 0.34 0.17 D 0.16 0.18 0.17 0.14

So the person belonging to RACE A will have a probability of survival after 3 years, call it SURV3_A = (1-0.1)*(1-0.2)*(1-0.3).

Similarly, SURV3_B=(1-0.16)*(1-0.25)*(1-0.45) and so on...

And a random person will have a probability of survival after 3 years as : PROB_A*SURV3_A+PROB_B*SURV3_B+PROB_C*SURV3_C+PROB_D*SURV3_D.

I was thinking of a generalization to calculate probability of survival of any random person after 'T' years when there can be 'N' possible RACEs.

I can think of taking one row at a time,using CUPROD, then summing across columns which seem inefficient, so wondering if any easy way?

Quartz | Level 8

Re: Efficient Cumulative Sum over Cumulative Product

I understand creating 3 more variables(cumulative product, so the 1st column will remain same, 2nd will be product till 2 years, 3rd will be for 3 years) will not be an issue and with another proc sql or retain statement sum of column and final sum will also be easy. But it'll take a lot of text and more than one data/proc steps(for me). For significantly large number of years, I was thinking of any more efficient way.
Super User

Re: Efficient Cumulative Sum over Cumulative Product

Still need the expected output to help you here. Or if you already have a solution and need it more efficient, perhaps post the code you have already.
Quartz | Level 8

Re: Efficient Cumulative Sum over Cumulative Product

Hi @Reeza, apologies for earlier responses. I have following matrix, X:

 0.1 0.2 0.3 0.16 0.25 0.45 0.23 0.34 0.17 0.18 0.17 0.14

(1-X) will be:

 0.9 0.8 0.7 0.84 0.75 0.55 0.77 0.66 0.83 0.82 0.83 0.86

And I need cumulative product at each row like:

 0.9 0.72 0.504 0.84 0.63 0.3465 0.77 0.5082 0.421806 0.82 0.6806 0.585316

Then I need column wise sum like:

 3.33 2.5388 1.85762

And in the end, a row sum :

 7.72642

I was superexcited with @Rick_SAS's solution which suggested use of (1-X)[,#], but it only gives me final column. Any suggestion?

SAS Super FREQ

Re: Efficient Cumulative Sum over Cumulative Product

Just want to add that it is considered bad etiquette to post the same question twice and solicit advice in two separate threads.

Quartz | Level 8

Re: Efficient Cumulative Sum over Cumulative Product

I am very sorry, while I was posting further information in your solution, I saw the section where it said 'If you need further help post a new question', so I marked the original post as solved and posted it again separately. I thought it would be disrespectful if I don't tell you about my inquiries, so I posted there too. I didn't mean to disrespect, which seemingly I did eventually.

Apologies to both @Rick_SAS and @Reeza.

SAS Super FREQ

Re: Efficient Cumulative Sum over Cumulative Product

I accept your apology, but let me clarify that the issue isn't that cross-posting is disrespectful. When there are two threads, it is hard for experts to work from the same information and to build a program that solves your problem. It will also be difficult for someone who has a similar question in the future to solve their problem by consulting these pages. There are ways to merge the two threads into one, but they have diverged so much that I'm not sure it would be helpful.  I have a hard time understanding these threads even though I was involved in writing one of them.

I would like to encourage you to revisit the "accepted solution" that you marked in the other thread. After you marked my initial response as correct, you provided extra information and I posted additional programs. I don't know whether the later solutions are better, but please mark the answer that best resolves your problem.

Quartz | Level 8

Re: Efficient Cumulative Sum over Cumulative Product

Thanks for understanding. I'll not repeat same mistake.

From The DO Loop
Discussion stats
• 8 replies
• 836 views
• 2 likes
• 3 in conversation