Hi. In my EG project, I have Dates, Values, and running CUM values.
I need another column for the cumulative value for the past 3 months.
I thought of doing it by either adding up the last 3 months values or subtracting this month's CUM value
from the CUM value 4 months ago or ??
What is the most efficient way to do this? Thanks!
MonthDate Value Overall CUM Value WANT=> Running 3 month CUM
06/2018 3 3 3
07/2018 5 8 8
08/2018 7 15 15
09/2018 4 19 16
10/2018 5 24 16
11/2018 2 26 11
data have;
input MonthDate  $          Value       Overall_CUM_Value    ;
cards;
06/2018                 3                   3                                               
07/2018                 5                   8                                               
08/2018                 7                   15                                           
09/2018                 4                   19                                           
10/2018                 5                   24                                           
11/2018                 2                    26  
;
data want;
set have;
lag1=lag1(value);
lag2=lag2(value);
want=sum(value,of lag:);
drop lag:;
run;
I would do GUI but I expect I have to run a separate program. (That's how I got the running CUM).
You mean this:
For Base SAS Software ...
Custom version information: 9.4_M4
Image version information: 9.04.01M4P110916
For SAS/CONNECT ...
Custom version information: 9.4_M4
For SAS Integration Technologies ...
Custom version information: 9.4_M4
For SAS/Secure 168-bit ...
Custom version information: 9.41_M1
For SAS/ACCESS Interface to Oracle ...
Custom version information: 9.41
For SAS/ACCESS Interface to R/3 ...
Custom version information: 4.4_M4
For SAS/ACCESS Interface to Teradata ...
Custom version information: 9.44
For SAS/ACCESS Interface to Microsoft SQL Server ...
Custom version information: 9.42
You don't have SAS ETS so you're best of doing a SQL Join where the condition is that the date is between the date and date-3 so you get the two previous values. You can do that via the GUI.
A data step approach is as follows. I assume you'll need to reset for ID. You may need to change the boundaries (<= to < or such) to get exactly what you want but this should get you started.
data want;
set have;
by id;
if first.id then count=0;
count+1;
_value1=lag(value);
_value2 = lag2(value);
_value3 = lag3(value);
if count <3 then call missing(_value1, _value2, _value3);
else cum_sum = sum(_value1, _value2, _value3);
run;
Thanks. I will unpack this tomorrow!
Your GUI advice is to Join two identical tables but offset one set of Dates by 3 months to get Value and Value-3 months
horizontal to each other? (clever approach to keep it simple).
LAG looks to be the function I was looking for.
Would ID be the MonthDate here?
So for 6 months, I would use 6 LAG statements...
On a quick look: Is the CALL MISSING statement used to fill in missing data and keep the calculation from blowing up?
From your example, it seems you want to keep the cumulative sums for the first couple of months as well, but there isn't an ID variable. Here's a matching approach:
data want;
set have;
running_3mo_cum = sum(running_3mo_cum, value, -lag3(value) ) ;
retain running_3mo_cum;
run;
data have;
input MonthDate  $          Value       Overall_CUM_Value    ;
cards;
06/2018                 3                   3                                               
07/2018                 5                   8                                               
08/2018                 7                   15                                           
09/2018                 4                   19                                           
10/2018                 5                   24                                           
11/2018                 2                    26  
;
data want;
set have;
lag1=lag1(value);
lag2=lag2(value);
want=sum(value,of lag:);
drop lag:;
run;
data have;
input MonthDate  $          Value       Overall_CUM_Value    ;
cards;
06/2018                 3                   3                                               
07/2018                 5                   8                                               
08/2018                 7                   15                                           
09/2018                 4                   19                                           
10/2018                 5                   24                                           
11/2018                 2                    26  
;
data want;
do _n_=1 by 1 until(lr);
set have end=lr;
array t(9999) _temporary_;
t(_n_)=value;
if _n_>3 then call missing(t(whichn(coalesce(of t(*)),of t(*))));
want=sum(of t(*));
output;
end;
run;
And if there are multiple Id's
/*If there are multiple ID's*/
data want;
array t(9999) _temporary_;
call missing(of t(*));
do _n_=1 by 1 until(last.id);
set have ;
by id;
t(_n_)=value;
if _n_>3 then call missing(t(whichn(coalesce(of t(*)),of t(*))));
want=sum(of t(*));
output;
end;
run;
Three successful solutions. Thanks! You folks save me so much time.
I added a tweak to Novinosrin's to get the three
previous months sum (to ignore an incomplete current month). (I put it here for the next
person who needs it 🙂 ).
....lag3=lag3(value)
cum_sum=sum(value,of lag:) - value;
cum_sum
06/2018    3    3    0
07/2018    5    8    3
08/2018    7  15    8
09/2018   4   19   15
10/2018   5   24   16
11/2018   2   26   16
12/2018   5   31   11
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
