- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Please help me with the following :
Employee | Salary |
emp1 | 100000 |
emp3 | 700000 |
emp4 | 400000 |
How to calculate cumulative sum of given dataset values using SQL.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Employee $ Salary;
cards;
emp1
100000
emp3
700000
emp4
400000
;
data _have;
set have curobs=k;
rn=k;
run;
proc sql;
create table want(drop=rn) as
select a.*,sum(b.salary) as cumulative_sum
from _have a inner join _have b
on b.rn<=a.rn
group by a.rn,a.employee,a.salary;
quit;
Employee | Salary | cumulative_sum |
---|---|---|
emp1 | 100000 | 100000 |
emp3 | 700000 | 800000 |
emp4 | 400000 | 1200000 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why?
Just use a data step.
data want;
set have;
cum_salary+salary;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why do you want to do this in SQL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Employee $ Salary;
cards;
emp1
100000
emp3
700000
emp4
400000
;
data _have;
set have curobs=k;
rn=k;
run;
proc sql;
create table want(drop=rn) as
select a.*,sum(b.salary) as cumulative_sum
from _have a inner join _have b
on b.rn<=a.rn
group by a.rn,a.employee,a.salary;
quit;
Employee | Salary | cumulative_sum |
---|---|---|
emp1 | 100000 | 100000 |
emp3 | 700000 | 800000 |
emp4 | 400000 | 1200000 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With presented setup the `_have` is not necessary:
proc sql;
create table want as
select a.*,sum(b.salary) as cumulative_sum
from have a inner join have b
on b.employee<=a.employee
group by a.employee,a.salary;
quit;
All the best
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I agree. However my thought was "what if" the order of employeeid's are not in sequence or in some random order for the group by to safely do the internal sort. Therefore I wanted that extra safety net. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to do advanced SQL processing, you should buy and read Joe Celko's book SQL For Smarties, available in hardcopy and on Kindle. Your task is advanced because it requires imposing an order on table rows.
Here's an example from his book of how you would calculate a median:
* From SQL For Smarties, Second Edition, by Joe Celko. ; * If there is no statistical median, the mean of the ; * value just below and the value just above is used. ; data parts; input @1 pno $2. @4 pname $5. @10 color $5. @16 weight 2. @19 city $6.; cards; p1 Nut Red 12 London p2 Bolt Green 17 Paris p3 Cam Blue 12 Paris p4 Screw Red 14 London p5 Cam Blue 12 Paris p6 Cog Red 19 London ;;;; *****; run; proc sql; select avg(distinct weight) from (select f1.weight from parts as f1, parts as f2 group by f1.pno, f1.weight having sum(case when f2.weight = f1.weight then 1 else 0 end) >= abs(sum(case when f2.weight < f1.weight then 1 when f2.weight > f1.weight then -1 else 0 end))); *****; quit;
The answer returned is 13.
Tasks requiring sequencing are often less efficient in a descriptive language like SQL than they are in a procedure language like the data step, and frequently harder to understand. (Median is a function in SAS SQL now, but I'm not certain that it always has been.)