Dear Forum,
Although conceptually easy to grasp, I have yet been able to devise a working piece of code that will create a cumulative total column that resets at each break in another variable. For example, given the following sample data set.
ID AMT
10 150
10 100
25 150
25 150
25 150
30 600
30 300
The result I would like to achieve is a cumulative total column for each distinct ID value (as follows):
ID AMT SumbyID
10 150 150
10 100 250
25 150 150
25 150 300
25 150 450
30 600 600
30 300 900
Thus far I have attempted various sum statement and do loop combinations using the last.<variable> as a marker, however I would be interest to know if their is a better method using PROC SQL.
Thank you in advance for your help.
Sincerely,
PQK
how about:
data have;
input ID AMT;
cards;
10 150
10 100
25 150
25 150
25 150
30 600
30 300
;
data want;
set have;
by id notsorted;
if first.id then sumbyid=0;
sumbyid+amt;
run;
proc print;run;
Obs ID AMT sumbyid
1 10 150 150
2 10 100 250
3 25 150 150
4 25 150 300
5 25 150 450
6 30 600 600
7 30 300 900
Linlin
how about:
data have;
input ID AMT;
cards;
10 150
10 100
25 150
25 150
25 150
30 600
30 300
;
data want;
set have;
by id notsorted;
if first.id then sumbyid=0;
sumbyid+amt;
run;
proc print;run;
Obs ID AMT sumbyid
1 10 150 150
2 10 100 250
3 25 150 150
4 25 150 300
5 25 150 450
6 30 600 600
7 30 300 900
Linlin
Linlin,
As I suspected, there was an easier way that I completely overlooked. Thank you for the tip!
I forgot to mention that I would need to create a cumulative total column subset by more than one variable. It slipped my mind at the time, but fortunately I resolved the problem by taking the two subsetting variables and concatenating them together to create a new temporary variable. After that, is was just a matter of using the same code example you suggested above.
Thanks again.
PQK
hi ... if I understand what you mean by a "cumulative total column subset by more than one variable" you can try this too ...
data x;
input a b c;
datalines;
1 1 10
1 1 10
1 1 10
1 2 20
1 3 30
1 3 30
2 1 100
2 1 100
2 2 50
2 2 60
;
run;
data y;
set x;
by a b;
cumab + (-first.b * cumab) + c;
run;
a b c cumab
1 1 10 10
1 1 10 20
1 1 10 30
1 2 20 20
1 3 30 30
1 3 30 60
2 1 100 100
2 1 100 200
2 2 50 50
2 2 60 110
Mike awesome code can you explain how this is interpreted by SAS:
cumab + (-first.b * cumab) + c;
hi ... I added the value of FIRST.B to the output to help the explanation ...
Obs a b c cumab first_b
1 1 1 10 10 1
2 1 1 10 20 0
3 1 1 10 30 0
4 1 2 20 20 1
5 1 3 30 30 1
6 1 3 30 60 0
7 2 1 100 100 1
8 2 1 100 200 0
9 2 2 50 50 1
10 2 2 60 110 0
values of the variable C are accumulated within values of the two variables A and B
when then first observation is read, the value of CUMAB is zero since that variable is an accumulator variable in a SUM statement ... so for the first observation, in this statement
cumab + (-first.b * cumab) + c
the value within the parentheses is zero and it just adds the value of variable C to CUMAB ... the value of FIRST.B is zero for all the remaining calculations within a group, so the value of the term within parentheses is once again zero ... you are just adding C to CUMAB
when you arrive at the first observation for the next group, the value of FIRST.B is 1 ... you add C to CUMAB, but you also subtract the LAST value of CUMAB from the total, so you are starting at zero again for the next group
that make sense ?
Thanks Mike for explanation
Hi linlin I think without this syntex it's working, so we don't need this syntax.
if first.id then sumbyid=0;
Hi,
I have a similar scenario
data x;
input a b c;
datalines;
1 1 10
2 1 10
3 1 10
4 2 20
5 3 30
6 3 30
7 1 100
8 1 100
9 2 50
;
run;
I want the result as follows
a b c cumab
1 1 10 10
2 1 10 20
3 1 10 30
4 2 20 20
5 3 30 30
6 3 30 60
7 1 100 100
8 1 20 120
9 2 50 50
The column "a" keeps the same order and I want the sum of the column "c" based on the values in the column "b".
Could you help my to calculate the column"cumab" as in the above table?
Thanks and Regards,
Retheesh Ravi.
data x; input a b c; datalines; 1 1 10 2 1 10 3 1 10 4 2 20 5 3 30 6 3 30 7 1 100 8 1 100 9 2 50 ; run; data want; set x; by b notsorted; if first.b then cumab=0; cumab+c; run;
Xia Keshan
hi ... another way (same result) ...
data y;
set x;
by b notsorted;
cumab + (-first.b * cumab) + c;
run;
I am very new to SAS. Base SAS we can do with datastep. But how can we do it with Proc SQL??
Thank you for the examples. I am a fairly new SAS user and this was very timely. Is it possible to take this one step further, where the cumulative total is a running total of the previous 5 observations, all under the same subsets? Thank you.
Good Day every one,
I am new with SAS and i am having a problem in calculating the cumulative sum of values, my table looks like this one
I used this code to calculate the cumulative sum in column C, by respect to id
data Ratings1; Set Ratings; retain c; if first.id then c=0; by id notsorted; c+j1; run;
the problem i have is that it gives me as result the same values as in column j1 as if the addition is not working.
Can you help me please , where is the problem in this code
thank you in advance
Hi Lamia,
This is quite interestting. To me it looks like as C variable is in orignal data set, so code that you have written is not working. Everytime row is getting read of rating dataset C is getting assigned 0. Here is how we can get around it.
data ratings1;
set ratings(drop=c);
retain c;
if first.id then c=0;
put first.id last.id;
by id notsorted;
c+j1;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.