BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kashun
Obsidian | Level 7

I am trying to find a way to create cumulative sum without changing positions of observations.

Have.

Observation

Name

Amount

1

John

10

2

Mark

20

3

Mark

10

4

John

40

5

John

30

6

Mark

20

7

John

10

 

Want

Observation

Name

Amount

Cummulative Sum

1

John

10

10

2

Mark

20

20

3

Mark

10

30

4

John

40

50

5

John

30

80

6

Mark

20

50

7

John

10

90

 

I tried by Name notsorted and did not work since the notsorted option group name into 5 groups.

Will be very grateful if someone can assist me on how to go about this or refer me to a link that can help.

 

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @kashun,

 

Here's another hash object solution:

data want;
if _n_=1 then do;
  dcl hash h(suminc:'amount');
  h.definekey('name');
  h.definedone();
end;
set have;
h.ref();
h.sum(sum:Cumulative_Sum);
run;

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Sort by name, compute cumulative sums, un-sort back to the original order.

--
Paige Miller
novinosrin
Tourmaline | Level 20

Hi @kashun  FWIW-

 



data have;
input Observation	Name $	Amount;
cards;
1	John	10
2	Mark	20
3	Mark	10
4	John	40
5	John	30
6	Mark	20
7	John	10
;

data want;
 set have;
 if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("name") ;
   h.definedata ("Cummulative_Sum") ;
   h.definedone () ;
 end;
 if h.find() ne 0 then Cummulative_Sum=amount;
 else Cummulative_Sum=sum(Cummulative_Sum,amount);
 h.replace();
run;
Reeza
Super User
How big will that name list get? There are ways but they're all more work than just sorting/unsorting.
kashun
Obsidian | Level 7

I am working with about 12+ GB of data with different names. Distinct count of names is more than 100 million.

FreelanceReinh
Jade | Level 19

Hi @kashun,

 

Here's another hash object solution:

data want;
if _n_=1 then do;
  dcl hash h(suminc:'amount');
  h.definekey('name');
  h.definedone();
end;
set have;
h.ref();
h.sum(sum:Cumulative_Sum);
run;
novinosrin
Tourmaline | Level 20

Sir @FreelanceReinh That's what separated a genius from mere mortal. Brilliant thinking!!! Kudos!

FreelanceReinh
Jade | Level 19

@novinosrin: Thanks. 🙂 I just thought this might be one of the "very limited and special cases" (The book, p. 219) where the SUM method proves useful.

kashun
Obsidian | Level 7

@ FreelanceReinhard. This is awesome. I might be wrong but I know hash objects stores a temporary data into memory.  Could there be another approach without using hash objects? 

FreelanceReinh
Jade | Level 19

@kashun wrote:

@ FreelanceReinhard. This is awesome. I might be wrong but I know hash objects stores a temporary data into memory.  Could there be another approach without using hash objects? 


It's true that the hash object would occupy a considerable amount of memory. Another common approach is to create an index on dataset HAVE, but the computation of Cumulative_Sum BY Name (using the index) would result in a sorted dataset.

 

I would try to implement the hash object approach, first on a smaller (but not too small) subset of HAVE. Then you could estimate the amount of memory needed for the full dataset. Depending on the length of variable Name it might be possible to reduce the memory footprint. (Note that in my code Name is also used as a data item, but this could be changed.) If it still exceeds the available RAM, maybe there's a possibility to split dataset HAVE or to take advantage of known characteristics of the dataset structure. For example, if a name is known to occur only up to a certain observation, its hash entry could be removed once that observation is reached.

CurtisMackWSIPP
Lapis Lazuli | Level 10

This would do it, but I suspect your case is more complicated some how.

 

data want;
  set have;
  retain Cummulative_Sum;
  label Cummulative_Sum = "Cummulative Sum";
  Cummulative_Sum = sum(Cummulative_Sum,Amount);
run;
kashun
Obsidian | Level 7

Yes. It looks like this is not taking into account name

RichardDeVen
Barite | Level 11

You can use several step operations to compute the cumulative sum and restore the original order

 

Example:

data have_v / view=have_v;
  set have;
  rownum = _n_;
run;

proc sort data=have_v out=have_ord;
  by name rownum;
run;

data want_v / view=want_v;
  set have_ord;
  by name;
  if first.name then cusum = 0;
  cusum + amount;
run;

proc sort data=want_v out=want(drop=rownum);
  by rownum;  
run;

proc sql;
drop table have, have_ord;
drop view have_v, want_v;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3638 views
  • 17 likes
  • 7 in conversation