How are you able to sum values cumulatively, so that all previous values are added into the most recent value?
For example, I have the following values:
Storm Value
XXX 1
AAA 2
CCC 3
BBB 4
However, I want the 'value' section to be added cumulatively to get this instead:
Storm Value
XXX 1
AAA 3
CCC 6
BBB 10
Hi.
How about this:
data in;
input Storm $ Value;
cards;
VVV 1
AAA 2
CCC 3
BBB 4
;
data want;
set in;
if value ne ' ';
sum+value;
run;
HI,
First you should sort by "Value". Then in a data step do something like:
retain cum_value;
if first.id then cum_value=0;
cum_value=cum_value+value;
*not tested
Allu
You should be able to do this by adding one statement:
data want;
set have;
cum_value + value;
run;
Note that VALUE will not be affected, but CUM_VALUE will be added containing the values you are looking for.
Good luck.
Alternatively, you can take a look at lag<n>(var) functions. It allows you to access the nth previous row value of var effectively allowing you to run rolling totals like last 7 days or last month total etc. but also to achieve your desired result without the need of an additionnal column. Syntax is slightly less natural though since you need to do a case for _N_ LE n
e.g.
data want;
set have;
value=value+coalesce(lag1(value),0)+coalesce(lag1(value),0);
run;
This effectively uses your previously rolling total in variable Value (as replaced at each data step iteration) and adds it to the current row "Value" and then replaces it.
Note that the do; end; is not necessary as this is a single line statement but I put it for readability
Also note that lag1(value) is equivalent to lag(value) but again, its best for readability.
Vincent
*edited after Astounding's comment. Doing clever use of lag1 pile effect to achieve the desired result. I strongly recommend against it though honestly as it's easy to get lost in piles when there are easier ways around like creating a new variable and dropping the old one.
Vince,
Unfortunately, the LAG function is much more complex than that. It does not retrieve the VALUE from the previous observation. Rather, it retrieves VALUE from the last time that the LAG function executed. This will lead to complications, such as:
If you need to reuse the variable name VALUE, it would be simpler to go through a combination of:
drop value;
rename cum_value=value;
Beware of LAG! It can be the right tool for some jobs, but it is extremely tricky.
I stand corrected. I've edited my above example although I recommend not using it. It is a not so trivial way around "pile" effect of lag function. It works though!
Thanks
Vince
Here's the approach I would probably use similar to others above although saving the conditional if statement in favor of the coalesce function.
data want;
set have;
total=value+coalesce(total,0);
retain total;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.