I have a dataset that contains the columns: Period, Segment and Sum.
Period | Segment | Sum |
jan-22 | A | 100 |
jan-22 | B | 50 |
jan-22 | C | 40 |
feb-22 | A | 80 |
feb-22 | B | 60 |
feb-22 | C | 30 |
mar-22 | A | 130 |
mar-22 | B | 110 |
mar-22 | C | 100 |
apr-22 | A | 200 |
apr-22 | B | 150 |
apr-22 | C | 40 |
maj-22 | A | 100 |
maj-22 | B | 80 |
maj-22 | C | 60 |
jun-22 | A | 200 |
jun-22 | B | 150 |
jun-22 | C | 40 |
jul-22 | A | 500 |
jul-22 | B | 100 |
jul-22 | C | 50 |
aug-22 | A | 400 |
aug-22 | B | 200 |
aug-22 | C | 100 |
sep-22 | A | 200 |
sep-22 | B | 100 |
sep-22 | C | 30 |
okt-22 | A | 400 |
okt-22 | B | 300 |
okt-22 | C | 200 |
nov-22 | A | 300 |
nov-22 | B | 100 |
nov-22 | C | 50 |
dec-22 | A | 40 |
dec-22 | B | 20 |
dec-22 | C | 10 |
Now I would like to create a new column that is the moving/rolling sum for each segment over the previous 6 periods.
In total this would give me the following table:
Period | Segment | Sum | summary_rolling_6months_per_segment |
jan-22 | A | 100 | |
jan-22 | B | 50 | |
jan-22 | C | 40 | |
feb-22 | A | 80 | |
feb-22 | B | 60 | |
feb-22 | C | 30 | |
mar-22 | A | 130 | |
mar-22 | B | 110 | |
mar-22 | C | 100 | |
apr-22 | A | 200 | |
apr-22 | B | 150 | |
apr-22 | C | 40 | |
maj-22 | A | 100 | |
maj-22 | B | 80 | |
maj-22 | C | 60 | |
jun-22 | A | 200 | |
jun-22 | B | 150 | |
jun-22 | C | 40 | |
jul-22 | A | 500 | 1310 |
jul-22 | B | 100 | 700 |
jul-22 | C | 50 | 360 |
aug-22 | A | 400 | 1610 |
aug-22 | B | 200 | 850 |
aug-22 | C | 100 | 420 |
sep-22 | A | 200 | 1730 |
sep-22 | B | 100 | 890 |
sep-22 | C | 30 | 420 |
okt-22 | A | 400 | 2000 |
okt-22 | B | 300 | 1080 |
okt-22 | C | 200 | 520 |
nov-22 | A | 300 | 2100 |
nov-22 | B | 100 | 1030 |
nov-22 | C | 50 | 530 |
dec-22 | A | 40 | 2040 |
dec-22 | B | 20 | 970 |
dec-22 | C | 10 | 480 |
To simplify for you when creating the sas table:
data have;
input Period $ Segment $ Sum $ sum_rolling_6_mths $
;
datalines;
jan-22 A 100 .
jan-22 B 50 .
jan-22 C 40 .
feb-22 A 80 .
feb-22 B 60 .
feb-22 C 30 .
mar-22 A 130 .
mar-22 B 110 .
mar-22 C 100 .
apr-22 A 200 .
apr-22 B 150 .
apr-22 C 100 .
maj-22 A 100 .
maj-22 B 80 .
maj-22 C 60 .
jun-22 A 200 .
jun-22 B 150 .
jun-22 C 40 .
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
okt-22 A 400 2000
okt-22 B 300 1080
okt-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480
;
run;
Answer by @PaigeMiller [Had to remove original post since accepted the answer by misstake].
First, do not try to work with dates/months as character strings, because these cannot be sorted. (If you did sort them, then April would be the first month, and I'm guessing you don't want that). Work with numeric date values, as shown below in variable month. Then sort by segment and month, then compute the moving averages using PROC EXPAND (if you have a license for SAS/ETS)
data have;
input Period $ Segment $ Sum;
month=input(compress(period,'-'),monyy.);
format month monyy7.;
datalines;
jan-22 A 100 .
jan-22 B 50 .
jan-22 C 40 .
feb-22 A 80 .
feb-22 B 60 .
feb-22 C 30 .
mar-22 A 130 .
mar-22 B 110 .
mar-22 C 100 .
apr-22 A 200 .
apr-22 B 150 .
apr-22 C 100 .
may-22 A 100 .
may-22 B 80 .
may-22 C 60 .
jun-22 A 200 .
jun-22 B 150 .
jun-22 C 40 .
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
oct-22 A 400 2000
oct-22 B 300 1080
oct-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480
;
proc sort data=have;
by segment month;
run;
proc expand data=have out=want;
by segment;
convert sum=sum_rolling_6_mths / transformout=(movsum 6 trimleft 5);
run;
Answer by @ballardw [Had to remove the original post due to accepting a solution by misstake].
Correctly describe the values summed. You are apparently adding the CURRENT to the previous 6 months, not a six month rolling sum.
Does each "month" always have exactly 3 observations?
Why if you want to "sum" a value are you making it character in your input statement?????
IF there are exactly 3 records for each month and in segment order with in month the following works WHEN SUM is numeric without needing any other modules.
This actually takes advantage of the + operator returning missing values when any are missing. The Lag function will return missing if that number of previous record does not have a value.
Caution: This assumes that the Sum variable is not ever missing. If you have missing values then you need to provide such an example. And better rules for what the rolling sum would be in that case.
data want; set have; rollingsum = lag3(sum)+ lag6(sum)+ lag9(sum)+ lag12(sum)+ lag15(sum)+ lag18(sum) +sum ; run;
How sure are you that 1310 is correct for the first Jul-22? Your data step has
apr-22 C 100 .
but your "want" table has which is a different value.
apr-22 | C | 40 |
When I add the data step values for Jan through Jun I get 1780, not 1310.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.