I have a dataset that contains the columns: Period, Segment and Sum.
Period | Segment | Sum |
jan-22 | A | 100 |
jan-22 | B | 50 |
jan-22 | C | 40 |
feb-22 | A | 80 |
feb-22 | B | 60 |
feb-22 | C | 30 |
mar-22 | A | 130 |
mar-22 | B | 110 |
mar-22 | C | 100 |
apr-22 | A | 200 |
apr-22 | B | 150 |
apr-22 | C | 40 |
maj-22 | A | 100 |
maj-22 | B | 80 |
maj-22 | C | 60 |
jun-22 | A | 200 |
jun-22 | B | 150 |
jun-22 | C | 40 |
jul-22 | A | 500 |
jul-22 | B | 100 |
jul-22 | C | 50 |
aug-22 | A | 400 |
aug-22 | B | 200 |
aug-22 | C | 100 |
sep-22 | A | 200 |
sep-22 | B | 100 |
sep-22 | C | 30 |
okt-22 | A | 400 |
okt-22 | B | 300 |
okt-22 | C | 200 |
nov-22 | A | 300 |
nov-22 | B | 100 |
nov-22 | C | 50 |
dec-22 | A | 40 |
dec-22 | B | 20 |
dec-22 | C | 10 |
Now I would like to create a new column that is the moving/rolling sum for each segment over the previous 6 periods.
In total this would give me the following table:
Period | Segment | Sum | summary_rolling_6months_per_segment |
jan-22 | A | 100 | |
jan-22 | B | 50 | |
jan-22 | C | 40 | |
feb-22 | A | 80 | |
feb-22 | B | 60 | |
feb-22 | C | 30 | |
mar-22 | A | 130 | |
mar-22 | B | 110 | |
mar-22 | C | 100 | |
apr-22 | A | 200 | |
apr-22 | B | 150 | |
apr-22 | C | 40 | |
maj-22 | A | 100 | |
maj-22 | B | 80 | |
maj-22 | C | 60 | |
jun-22 | A | 200 | |
jun-22 | B | 150 | |
jun-22 | C | 40 | |
jul-22 | A | 500 | 1310 |
jul-22 | B | 100 | 700 |
jul-22 | C | 50 | 360 |
aug-22 | A | 400 | 1610 |
aug-22 | B | 200 | 850 |
aug-22 | C | 100 | 420 |
sep-22 | A | 200 | 1730 |
sep-22 | B | 100 | 890 |
sep-22 | C | 30 | 420 |
okt-22 | A | 400 | 2000 |
okt-22 | B | 300 | 1080 |
okt-22 | C | 200 | 520 |
nov-22 | A | 300 | 2100 |
nov-22 | B | 100 | 1030 |
nov-22 | C | 50 | 530 |
dec-22 | A | 40 | 2040 |
dec-22 | B | 20 | 970 |
dec-22 | C | 10 | 480 |
To simplify for you when creating the sas table:
data have;
input Period $ Segment $ Sum $ sum_rolling_6_mths $
;
datalines;
jan-22 A 100 .
jan-22 B 50 .
jan-22 C 40 .
feb-22 A 80 .
feb-22 B 60 .
feb-22 C 30 .
mar-22 A 130 .
mar-22 B 110 .
mar-22 C 100 .
apr-22 A 200 .
apr-22 B 150 .
apr-22 C 100 .
maj-22 A 100 .
maj-22 B 80 .
maj-22 C 60 .
jun-22 A 200 .
jun-22 B 150 .
jun-22 C 40 .
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
okt-22 A 400 2000
okt-22 B 300 1080
okt-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480
;
run;
First, do not try to work with dates/months as character strings, because these cannot be sorted. (If you did sort them, then April would be the first month, and I'm guessing you don't want that). Work with numeric date values, as shown below in variable month. Then sort by segment and month, then compute the moving averages using PROC EXPAND (if you have a license for SAS/ETS)
data have;
input Period $ Segment $ Sum;
month=input(compress(period,'-'),monyy.);
format month monyy7.;
datalines;
jan-22 A 100 .
jan-22 B 50 .
jan-22 C 40 .
feb-22 A 80 .
feb-22 B 60 .
feb-22 C 30 .
mar-22 A 130 .
mar-22 B 110 .
mar-22 C 100 .
apr-22 A 200 .
apr-22 B 150 .
apr-22 C 100 .
may-22 A 100 .
may-22 B 80 .
may-22 C 60 .
jun-22 A 200 .
jun-22 B 150 .
jun-22 C 40 .
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
oct-22 A 400 2000
oct-22 B 300 1080
oct-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480
;
proc sort data=have;
by segment month;
run;
proc expand data=have out=want;
by segment;
convert sum=sum_rolling_6_mths / transformout=(movsum 6 trimleft 5);
run;
First, do not try to work with dates/months as character strings, because these cannot be sorted. (If you did sort them, then April would be the first month, and I'm guessing you don't want that). Work with numeric date values, as shown below in variable month. Then sort by segment and month, then compute the moving averages using PROC EXPAND (if you have a license for SAS/ETS)
data have;
input Period $ Segment $ Sum;
month=input(compress(period,'-'),monyy.);
format month monyy7.;
datalines;
jan-22 A 100 .
jan-22 B 50 .
jan-22 C 40 .
feb-22 A 80 .
feb-22 B 60 .
feb-22 C 30 .
mar-22 A 130 .
mar-22 B 110 .
mar-22 C 100 .
apr-22 A 200 .
apr-22 B 150 .
apr-22 C 100 .
may-22 A 100 .
may-22 B 80 .
may-22 C 60 .
jun-22 A 200 .
jun-22 B 150 .
jun-22 C 40 .
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
oct-22 A 400 2000
oct-22 B 300 1080
oct-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480
;
proc sort data=have;
by segment month;
run;
proc expand data=have out=want;
by segment;
convert sum=sum_rolling_6_mths / transformout=(movsum 6 trimleft 5);
run;
data have;
set have;
period_sas_date = input(period, anydtdte.);
format period_sas_date monyy5.;
run;
proc sort data=have;
by segment period_sas_date;
run;
data want;
array p{0:5} _temporary_;
set have; by segment;
if first.segment then do; call missing(of p{*}); counter=0;
end;
counter+1;
p{mod(_n_,6)} = sum;
if counter >5 then rolling_sum = sum(of p{*});
run;
proc sort data=want;
by period_sas_date segment;
run;
You can use temporary arrays instead - you need to make sure you have a license for proc expand.
Are you sure the numbers shown are correct? They seem to be 7 months moving sum?
https://communities.sas.com/message/244232
From now on, @SasStatistics you can leave this part out
and just provide the SAS code that contains the data. We don't use screen captures of data, we do use SAS data step code.
As far as not having PROC EXPAND, searching the forums for moving average or moving sum should turn up lots of ideas, such as this: https://communities.sas.com/t5/SAS-Programming/Computing-moving-average-by-group-and-year/m-p/824977
Does each "month" always have exactly 3 observations?
Why if you want to "sum" a value are you making it character in your input statement?????
How sure are you that 1310 is correct for the first Jul-22? Your data step has
apr-22 C 100 .
but your "want" table has which is a different value.
apr-22 | C | 40 |
When I add the data step values for Jan through Jun I get 1780, not 1310.
IF there are exactly 3 records for each month the following works WHEN SUM is numeric without needing any other modules. If you actually have date values then perhaps Proc Expand would do this as well.
data want; set have; rollingsum = lag1(sum)+lag2(sum)+lag3(sum)+ lag4(sum)+lag5(sum)+lag6(sum)+ lag7(sum)+lag8(sum)+lag9(sum)+ lag10(sum)+lag11(sum)+lag12(sum)+ lag13(sum)+lag14(sum)+lag15(sum)+ lag16(sum)+lag17(sum)+lag18(sum) ; run;
This actually takes advantage of the + operator returning missing values when any are missing. The Lag function will return missing if that number of previous record does not have a value.
Caution: This assumes that the Sum variable is not ever missing. If you have missing values then you need to provide such an example.
@SasStatistics wrote:
I have a dataset that contains the columns: Period, Segment and Sum.
Period Segment Sum jan-22 A 100 jan-22 B 50 jan-22 C 40 feb-22 A 80 feb-22 B 60 feb-22 C 30 mar-22 A 130 mar-22 B 110 mar-22 C 100 apr-22 A 200 apr-22 B 150 apr-22 C 40 maj-22 A 100 maj-22 B 80 maj-22 C 60 jun-22 A 200 jun-22 B 150 jun-22 C 40 jul-22 A 500 jul-22 B 100 jul-22 C 50 aug-22 A 400 aug-22 B 200 aug-22 C 100 sep-22 A 200 sep-22 B 100 sep-22 C 30 okt-22 A 400 okt-22 B 300 okt-22 C 200 nov-22 A 300 nov-22 B 100 nov-22 C 50 dec-22 A 40 dec-22 B 20 dec-22 C 10
Now I would like to create a new column that is the moving/rolling sum for each segment over the previous 6 periods.
In total this would give me the following table:
Period Segment Sum summary_rolling_6months_per_segment jan-22 A 100 jan-22 B 50 jan-22 C 40 feb-22 A 80 feb-22 B 60 feb-22 C 30 mar-22 A 130 mar-22 B 110 mar-22 C 100 apr-22 A 200 apr-22 B 150 apr-22 C 40 maj-22 A 100 maj-22 B 80 maj-22 C 60 jun-22 A 200 jun-22 B 150 jun-22 C 40 jul-22 A 500 1310 jul-22 B 100 700 jul-22 C 50 360 aug-22 A 400 1610 aug-22 B 200 850 aug-22 C 100 420 sep-22 A 200 1730 sep-22 B 100 890 sep-22 C 30 420 okt-22 A 400 2000 okt-22 B 300 1080 okt-22 C 200 520 nov-22 A 300 2100 nov-22 B 100 1030 nov-22 C 50 530 dec-22 A 40 2040 dec-22 B 20 970 dec-22 C 10 480
To simplify for you when creating the sas table:
data have; input Period $ Segment $ Sum $ sum_rolling_6_mths $ ; datalines; jan-22 A 100 . jan-22 B 50 . jan-22 C 40 . feb-22 A 80 . feb-22 B 60 . feb-22 C 30 . mar-22 A 130 . mar-22 B 110 . mar-22 C 100 . apr-22 A 200 . apr-22 B 150 . apr-22 C 100 . maj-22 A 100 . maj-22 B 80 . maj-22 C 60 . jun-22 A 200 . jun-22 B 150 . jun-22 C 40 . jul-22 A 500 1310 jul-22 B 100 700 jul-22 C 50 360 aug-22 A 400 1610 aug-22 B 200 850 aug-22 C 100 420 sep-22 A 200 1730 sep-22 B 100 890 sep-22 C 30 420 okt-22 A 400 2000 okt-22 B 300 1080 okt-22 C 200 520 nov-22 A 300 2100 nov-22 B 100 1030 nov-22 C 50 530 dec-22 A 40 2040 dec-22 B 20 970 dec-22 C 10 480 ; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.