Solved: Re: How do I create a cumulative total column subset by another variab...

PQK · Posted 06-23-2012 11:42 PM

Dear Forum,

Although conceptually easy to grasp, I have yet been able to devise a working piece of code that will create a cumulative total column that resets at each break in another variable. For example, given the following sample data set.

ID AMT

10 150

10 100

25 150

30 600

30 300

The result I would like to achieve is a cumulative total column for each distinct ID value (as follows):

ID AMT SumbyID

10 150 150

10 100 250

25 150 150

25 150 300

25 150 450

30 600 600

30 300 900

Thus far I have attempted various sum statement and do loop combinations using the last.<variable> as a marker, however I would be interest to know if their is a better method using PROC SQL.

Thank you in advance for your help.

Sincerely,

PQK

Linlin · Posted 06-24-2012 12:33 AM

how about:

data have;

input ID AMT;

cards;

10 150

10 100

25 150

30 600

30 300

;

data want;

set have;

by id notsorted;

if first.id then sumbyid=0;

sumbyid+amt;

run;

proc print;run;

Obs ID AMT sumbyid

1 10 150 150

2 10 100 250

3 25 150 150

4 25 150 300

5 25 150 450

6 30 600 600

7 30 300 900

Linlin

View solution in original post

Linlin · Posted 06-24-2012 12:33 AM

how about:

data have;

input ID AMT;

cards;

10 150

10 100

25 150

30 600

30 300

;

data want;

set have;

by id notsorted;

if first.id then sumbyid=0;

sumbyid+amt;

run;

proc print;run;

Obs ID AMT sumbyid

1 10 150 150

2 10 100 250

3 25 150 150

4 25 150 300

5 25 150 450

6 30 600 600

7 30 300 900

Linlin

PQK · Posted 06-25-2012 07:34 PM

Linlin,

As I suspected, there was an easier way that I completely overlooked. Thank you for the tip!

I forgot to mention that I would need to create a cumulative total column subset by more than one variable. It slipped my mind at the time, but fortunately I resolved the problem by taking the two subsetting variables and concatenating them together to create a new temporary variable. After that, is was just a matter of using the same code example you suggested above.

Thanks again.

PQK

MikeZdeb · Posted 06-25-2012 10:14 PM

hi ... if I understand what you mean by a "cumulative total column subset by more than one variable" you can try this too ...

data x;

input a b c;

datalines;

1 1 10

1 2 20

1 3 30

2 1 100

2 2 50

2 2 60

;

run;

data y;

set x;

by a b;

cumab + (-first.b * cumab) + c;

run;

a b c cumab

1 1 10 10

1 1 10 20

1 1 10 30

1 2 20 20

1 3 30 30

1 3 30 60

2 1 100 100

2 1 100 200

2 2 50 50

2 2 60 110

jimbobob · Posted 09-05-2014 05:57 PM

Mike awesome code can you explain how this is interpreted by SAS:

cumab + (-first.b * cumab) + c;

MikeZdeb · Posted 09-08-2014 04:56 PM

hi ... I added the value of FIRST.B to the output to help the explanation ...

Obs a b c cumab first_b

1 1 1 10 10 1

2 1 1 10 20 0

3 1 1 10 30 0

4 1 2 20 20 1

5 1 3 30 30 1

6 1 3 30 60 0

7 2 1 100 100 1

8 2 1 100 200 0

9 2 2 50 50 1

10 2 2 60 110 0

values of the variable C are accumulated within values of the two variables A and B

when then first observation is read, the value of CUMAB is zero since that variable is an accumulator variable in a SUM statement ... so for the first observation, in this statement

cumab + (-first.b * cumab) + c

the value within the parentheses is zero and it just adds the value of variable C to CUMAB ... the value of FIRST.B is zero for all the remaining calculations within a group, so the value of the term within parentheses is once again zero ... you are just adding C to CUMAB

when you arrive at the first observation for the next group, the value of FIRST.B is 1 ... you add C to CUMAB, but you also subtract the LAST value of CUMAB from the total, so you are starting at zero again for the next group

that make sense ?

jimbobob · Posted 09-08-2014 05:23 PM

Thanks Mike for explanation

sin_golbarg · Posted 09-28-2014 04:03 PM

Hi linlin I think without this syntex it's working, so we don't need this syntax.

if first.id then sumbyid=0;

retheesh_analytic_gmail_com · Posted 11-06-2014 05:23 AM

Hi,

I have a similar scenario

data x;

input a b c;

datalines;

1 1 10

2 1 10

3 1 10

4 2 20

5 3 30

6 3 30

7 1 100

8 1 100

9 2 50

;

run;

I want the result as follows

a b c cumab

1 1 10 10

2 1 10 20

3 1 10 30

4 2 20 20

5 3 30 30

6 3 30 60

7 1 100 100

8 1 20 120

9 2 50 50

The column "a" keeps the same order and I want the sum of the column "c" based on the values in the column "b".

Could you help my to calculate the column"cumab" as in the above table?

Thanks and Regards,

Retheesh Ravi.

Ksharp · Posted 11-06-2014 07:32 AM


data x;
input a b c;
datalines;
1 1 10
2 1 10
3 1 10
4 2 20
5 3 30
6 3 30
7 1 100
8 1 100
9 2 50
;
run;
data want;
 set x;
 by b notsorted;
 if first.b then cumab=0;
 cumab+c;
run;

Xia Keshan

MikeZdeb · Posted 11-10-2014 12:50 PM

hi ... another way (same result) ...

data y;

set x;

by b notsorted;

cumab + (-first.b * cumab) + c;

run;

vijaylaxmi · Posted 04-16-2016 10:25 PM

I am very new to SAS. Base SAS we can do with datastep. But how can we do it with Proc SQL??

BLarson · Posted 09-16-2015 02:11 PM

Thank you for the examples. I am a fairly new SAS user and this was very timely. Is it possible to take this one step further, where the cumulative total is a running total of the previous 5 observations, all under the same subsets? Thank you.

Lamia · Posted 04-06-2017 08:34 AM

Good Day every one,

I am new with SAS and i am having a problem in calculating the cumulative sum of values, my table looks like this one

I used this code to calculate the cumulative sum in column C, by respect to id

data Ratings1;
Set Ratings;
retain c;
if first.id then c=0;
by id notsorted;
c+j1;
run;

the problem i have is that it gives me as result the same values as in column j1 as if the addition is not working.

Can you help me please , where is the problem in this code

thank you in advance

vijaylaxmi · Posted 04-06-2017 10:32 AM

Hi Lamia,

This is quite interestting. To me it looks like as C variable is in orignal data set, so code that you have written is not working. Everytime row is getting read of rating dataset C is getting assigned 0. Here is how we can get around it.

data ratings1;
set ratings(drop=c);
retain c;
if first.id then c=0;
put first.id last.id;
by id notsorted;
c+j1;
run;

How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

Re: How do I create a cumulative total column subset by another variable?

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away