Hi all,
Please help me with the following :
Employee | Salary |
emp1 | 100000 |
emp3 | 700000 |
emp4 | 400000 |
How to calculate cumulative sum of given dataset values using SQL.
Thanks!
data have;
input Employee $ Salary;
cards;
emp1
100000
emp3
700000
emp4
400000
;
data _have;
set have curobs=k;
rn=k;
run;
proc sql;
create table want(drop=rn) as
select a.*,sum(b.salary) as cumulative_sum
from _have a inner join _have b
on b.rn<=a.rn
group by a.rn,a.employee,a.salary;
quit;
Employee | Salary | cumulative_sum |
---|---|---|
emp1 | 100000 | 100000 |
emp3 | 700000 | 800000 |
emp4 | 400000 | 1200000 |
Why?
Just use a data step.
data want;
set have;
cum_salary+salary;
run;
Why do you want to do this in SQL?
data have;
input Employee $ Salary;
cards;
emp1
100000
emp3
700000
emp4
400000
;
data _have;
set have curobs=k;
rn=k;
run;
proc sql;
create table want(drop=rn) as
select a.*,sum(b.salary) as cumulative_sum
from _have a inner join _have b
on b.rn<=a.rn
group by a.rn,a.employee,a.salary;
quit;
Employee | Salary | cumulative_sum |
---|---|---|
emp1 | 100000 | 100000 |
emp3 | 700000 | 800000 |
emp4 | 400000 | 1200000 |
With presented setup the `_have` is not necessary:
proc sql;
create table want as
select a.*,sum(b.salary) as cumulative_sum
from have a inner join have b
on b.employee<=a.employee
group by a.employee,a.salary;
quit;
All the best
Bart
I agree. However my thought was "what if" the order of employeeid's are not in sequence or in some random order for the group by to safely do the internal sort. Therefore I wanted that extra safety net. 🙂
If you want to do advanced SQL processing, you should buy and read Joe Celko's book SQL For Smarties, available in hardcopy and on Kindle. Your task is advanced because it requires imposing an order on table rows.
Here's an example from his book of how you would calculate a median:
* From SQL For Smarties, Second Edition, by Joe Celko. ; * If there is no statistical median, the mean of the ; * value just below and the value just above is used. ; data parts; input @1 pno $2. @4 pname $5. @10 color $5. @16 weight 2. @19 city $6.; cards; p1 Nut Red 12 London p2 Bolt Green 17 Paris p3 Cam Blue 12 Paris p4 Screw Red 14 London p5 Cam Blue 12 Paris p6 Cog Red 19 London ;;;; *****; run; proc sql; select avg(distinct weight) from (select f1.weight from parts as f1, parts as f2 group by f1.pno, f1.weight having sum(case when f2.weight = f1.weight then 1 else 0 end) >= abs(sum(case when f2.weight < f1.weight then 1 when f2.weight > f1.weight then -1 else 0 end))); *****; quit;
The answer returned is 13.
Tasks requiring sequencing are often less efficient in a descriptive language like SQL than they are in a procedure language like the data step, and frequently harder to understand. (Median is a function in SAS SQL now, but I'm not certain that it always has been.)
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.