Help using Base SAS procedures

How to create cumulative sums

Reply
N/A
Posts: 1

How to create cumulative sums

How are you able to sum values cumulatively, so that all previous values are added into the most recent value?

For example, I have the following values:

Storm     Value

XXX        1

AAA        2

CCC        3

BBB         4

However, I want the 'value' section to be added cumulatively to get this instead:

Storm     Value

XXX        1

AAA        3

CCC        6

BBB        10

Super Contributor
Posts: 543

Re: How to create cumulative sums

Hi.

How about this:

data in;

  input Storm $     Value;

  cards;

VVV        1

AAA        2

CCC        3

BBB         4

;

data want;

set in;

if value ne ' ';

sum+value;

run;

Smiley Happy

Frequent Contributor
Posts: 97

Re: How to create cumulative sums

HI,

First you should sort by  "Value".  Then in a data step do something like:

retain cum_value;

if first.id then cum_value=0;

cum_value=cum_value+value;

*not tested

Allu

Super User
Posts: 5,513

Re: How to create cumulative sums

You should be able to do this by adding one statement:

data want;

set have;

cum_value + value;

run;

Note that VALUE will not be affected, but CUM_VALUE will be added containing the values you are looking for.

Good luck.

Super Contributor
Posts: 339

Re: How to create cumulative sums

Alternatively, you can take a look at lag<n>(var) functions. It allows you to access the nth previous row value of var effectively allowing you to run rolling totals like last 7 days or last month total etc. but also to achieve your desired result without the need of an additionnal column. Syntax is slightly less natural though since you need to do a case for _N_ LE n

e.g.

data want;

     set have;

     value=value+coalesce(lag1(value),0)+coalesce(lag1(value),0);

run;

This effectively uses your previously rolling total in variable Value (as replaced at each data step iteration) and adds it to the current row "Value" and then replaces it.

Note that the do; end; is not necessary as this is a single line statement but I put it for readability

Also note that lag1(value) is equivalent to lag(value) but again, its best for readability.

Vincent

*edited after Astounding's comment. Doing clever use of lag1 pile effect to achieve the desired result. I strongly recommend against it though honestly as it's easy to get lost in piles when there are easier ways around like creating a new variable and dropping the old one.

Super User
Posts: 5,513

Re: How to create cumulative sums

Posted in reply to Vince28_Statcan

Vince,

Unfortunately, the LAG function is much more complex than that.  It does not retrieve the VALUE from the previous observation.  Rather, it retrieves VALUE from the last time that the LAG function executed.  This will lead to complications, such as:

  • The first time LAG executes (on observation #2), LAG retrieves a missing value.
  • Subsequent to the second observation, LAG retrieves VALUE as it came in from the SET statement on the previous observation, before it has been incremented.

If you need to reuse the variable name VALUE, it would be simpler to go through a combination of:

drop value;

rename cum_value=value;

Beware of LAG!  It can be the right tool for some jobs, but it is extremely tricky.

Super Contributor
Posts: 339

Re: How to create cumulative sums

Posted in reply to Astounding

I stand corrected. I've edited my above example although I recommend not using it. It is a not so trivial way around "pile" effect of lag function. It works though!

Thanks

Vince

Here's the approach I would probably use similar to others above although saving the conditional if statement in favor of the coalesce function.

data want;

     set have;

     total=value+coalesce(total,0);

     retain total;

run;

Ask a Question
Discussion stats
  • 6 replies
  • 290 views
  • 1 like
  • 5 in conversation