I have a dataset with a list of facilities and their number of cases by day. I know how calculate cumulative sums over the entire time frame (day 1 to present). I have been asked to calculate a moving 2-week cumulative sum for the last 3 2-week periods. Let's say today is April 28. I would need the cumulative sum between April 15-28 (CumSum1), April 8-21 (CumSum2), and April 1-14 (CumSum3). I have reviewed examples of PROC EXPAND, but I am having a hard time following the syntax. Any help would be greatly appreciated.
Have:
FacName | DtCreated | CntCase |
FACILITY1 | 04012020 | 0 |
FACILITY1 | 04022020 | 0 |
FACILITY1 | 04032020 | 0 |
FACILITY1 | 04042020 | 0 |
FACILITY1 | 04052020 | 0 |
FACILITY1 | 04062020 | 0 |
FACILITY1 | 04072020 | 0 |
FACILITY1 | 04082020 | 0 |
FACILITY1 | 04092020 | 0 |
FACILITY1 | 04102020 | 0 |
FACILITY1 | 04112020 | 0 |
FACILITY1 | 04122020 | 0 |
FACILITY1 | 04132020 | 0 |
FACILITY1 | 04142020 | 0 |
FACILITY1 | 04152020 | 0 |
FACILITY1 | 04162020 | 0 |
FACILITY1 | 04172020 | 0 |
FACILITY1 | 04182020 | 0 |
FACILITY1 | 04192020 | 0 |
FACILITY1 | 04202020 | 0 |
FACILITY1 | 04212020 | 0 |
FACILITY1 | 04222020 | 0 |
FACILITY1 | 04232020 | 0 |
FACILITY1 | 04242020 | 1 |
FACILITY1 | 04252020 | 0 |
FACILITY1 | 04262020 | 0 |
FACILITY1 | 04272020 | 0 |
FACILITY1 | 04282020 | 0 |
FACILITY2 | 04012020 | 1 |
FACILITY2 | 04022020 | 0 |
FACILITY2 | 04032020 | 1 |
FACILITY2 | 04042020 | 1 |
FACILITY2 | 04052020 | 1 |
FACILITY2 | 04062020 | 1 |
FACILITY2 | 04072020 | 8 |
FACILITY2 | 04082020 | 31 |
FACILITY2 | 04092020 | 5 |
FACILITY2 | 04102020 | 1 |
FACILITY2 | 04112020 | 0 |
FACILITY2 | 04122020 | 6 |
FACILITY2 | 04132020 | 0 |
FACILITY2 | 04142020 | 3 |
FACILITY2 | 04152020 | 2 |
FACILITY2 | 04162020 | 4 |
FACILITY2 | 04172020 | 0 |
FACILITY2 | 04182020 | 3 |
FACILITY2 | 04192020 | 0 |
FACILITY2 | 04202020 | 3 |
FACILITY2 | 04212020 | 3 |
FACILITY2 | 04222020 | 2 |
FACILITY2 | 04232020 | 1 |
FACILITY2 | 04242020 | 1 |
FACILITY2 | 04252020 | 1 |
FACILITY2 | 04262020 | 0 |
FACILITY2 | 04272020 | 1 |
FACILITY2 | 04282020 | 0 |
Want:
Facility | CumSum1 | CumSum2 | CumSum3 |
FACILITY1 | 1 | 0 | 0 |
FACILITY2 | 21 | 61 | 0 |
If the ending date is fixed across facilities, you could do:
*%let td=%sysfunc(date());
/* Or */
%let td=%sysevalf('24apr2020'd);
proc expand data=have out=temp (drop=cntcase where=(dtback in (0,7,14))) ;
by facname;
id dtcreated;
convert cntcase=sum14 / transformin=(movsum 14);
convert dtcreated=dtback / transformout=(neg + &td);
run;
proc transpose data=temp out=want(drop=_name_) prefix=x_;
by facname;
var sum14;
id dtback;
format dtback best.;
run;
Why not create formats? Since the periods overlap you must remember to use the multilabel option.
PROC EXPAND can easily generate rolling two week sums. The "trick" is to keep only the most recent two week sum. If all your facilities have the same date range (i.e. all end on '28apr2020'd), and they are all recorded daily, then this would work:
proc expand data=have out=want (drop=cntcase where=(dtcreated>='28apr2020'd)) ;
by facname;
id dtcreated;
convert cntcase=sum14 / transformin=(movsum 14);
run;
If the ending date varies between facilities, then a data step might be better:
data want2 (keep=facname dtcreated sum14);
set have;
by facname;
array days14 {0:13} _temporary_;
if first.facname then call missing(of days14{*});
days14{mod(_n_,14)}=cntcase;
if last.facname;
if n(of days14{*}) then sum14=sum(of days14{*});
run;
If the ending date is fixed across facilities, you could do:
*%let td=%sysfunc(date());
/* Or */
%let td=%sysevalf('24apr2020'd);
proc expand data=have out=temp (drop=cntcase where=(dtback in (0,7,14))) ;
by facname;
id dtcreated;
convert cntcase=sum14 / transformin=(movsum 14);
convert dtcreated=dtback / transformout=(neg + &td);
run;
proc transpose data=temp out=want(drop=_name_) prefix=x_;
by facname;
var sum14;
id dtback;
format dtback best.;
run;
@PGStats The end date will always be the same date. I have also used PROC TIMESERIES to fill in the missing dates and assign a value of 0.
This code works great. What is the word "neg" doing in the transformout segment? Another question, do you know why a sum would come out to be -0? Not all of my zeros are coming out this way, just on occasion.
FacName | X_14 | X_7 | X_0 |
Facility1 | 0 | 2 | 2 |
Facility2 | 1 | -0 | 0 |
Facility3 | -0 | -0 | 2 |
Thanks for everyone's help.
NEG replaces x with -x . Those minus zeros are certainly small numbers that are rounded to zero for display. This is something that I noticed before, Proc expand is very fast but it doesn't seem to use the same precautions as other SAS procs to prevent rounding error propagation.
Hi @gdaymte,
How about conditional aggregation with SQL and little %MACRO? (btw. shouldn't CumSum3 for "Fac2" be 59?)
data have;
input FacName : $ 12. DtCreated : mmddyy8. CntCase;
format DtCreated mmddyy10.;
cards4;
FACILITY1 04012020 0
FACILITY1 04022020 0
FACILITY1 04032020 0
FACILITY1 04042020 0
FACILITY1 04052020 0
FACILITY1 04062020 0
FACILITY1 04072020 0
FACILITY1 04082020 0
FACILITY1 04092020 0
FACILITY1 04102020 0
FACILITY1 04112020 0
FACILITY1 04122020 0
FACILITY1 04132020 0
FACILITY1 04142020 0
FACILITY1 04152020 0
FACILITY1 04162020 0
FACILITY1 04172020 0
FACILITY1 04182020 0
FACILITY1 04192020 0
FACILITY1 04202020 0
FACILITY1 04212020 0
FACILITY1 04222020 0
FACILITY1 04232020 0
FACILITY1 04242020 1
FACILITY1 04252020 0
FACILITY1 04262020 0
FACILITY1 04272020 0
FACILITY1 04282020 0
FACILITY2 04012020 1
FACILITY2 04022020 0
FACILITY2 04032020 1
FACILITY2 04042020 1
FACILITY2 04052020 1
FACILITY2 04062020 1
FACILITY2 04072020 8
FACILITY2 04082020 31
FACILITY2 04092020 5
FACILITY2 04102020 1
FACILITY2 04112020 0
FACILITY2 04122020 6
FACILITY2 04132020 0
FACILITY2 04142020 3
FACILITY2 04152020 2
FACILITY2 04162020 4
FACILITY2 04172020 0
FACILITY2 04182020 3
FACILITY2 04192020 0
FACILITY2 04202020 3
FACILITY2 04212020 3
FACILITY2 04222020 2
FACILITY2 04232020 1
FACILITY2 04242020 1
FACILITY2 04252020 1
FACILITY2 04262020 0
FACILITY2 04272020 1
FACILITY2 04282020 0
;;;;
run;
options MPRINT;
proc sql;
select
FacName
%macro intoThePast(periodsNum, start, date, aggr);
%do i = 1 %to &periodsNum.;
%local condition;
%let condition = "%sysfunc(intnx(days, &start., 7*( 1-&i.)),date9.)"d
>= DtCreated >
"%sysfunc(intnx(days, &start., 7*(-1-&i.)),date9.)"d; /* make it -2 for 3 week periods, etc. */
,sum(case when &condition. then &aggr.
else 0
end) as CumSum&i.
%end;
%mend intoThePast;
%intoThePast(3, '28Apr2020'd, DtCreated, CntCase)
from
have
group by
FacName
;
quit;
All the best
Bart
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.