BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SasStatistics
Pyrite | Level 9

I have a dataset that contains the columns: Period, Segment and Sum. 

Period Segment Sum
jan-22 A 100
jan-22 B 50
jan-22 C 40
feb-22 A 80
feb-22 B 60
feb-22 C 30
mar-22 A 130
mar-22 B 110
mar-22 C 100
apr-22 A 200
apr-22 B 150
apr-22 C 40
maj-22 A 100
maj-22 B 80
maj-22 C 60
jun-22 A 200
jun-22 B 150
jun-22 C 40
jul-22 A 500
jul-22 B 100
jul-22 C 50
aug-22 A 400
aug-22 B 200
aug-22 C 100
sep-22 A 200
sep-22 B 100
sep-22 C 30
okt-22 A 400
okt-22 B 300
okt-22 C 200
nov-22 A 300
nov-22 B 100
nov-22 C 50
dec-22 A 40
dec-22 B 20
dec-22 C 10


Now I would like to create a new column that is the moving/rolling sum for each segment over the previous 6 periods. 

In total this would give me the following table: 

Period Segment Sum summary_rolling_6months_per_segment
jan-22 A 100  
jan-22 B 50  
jan-22 C 40  
feb-22 A 80  
feb-22 B 60  
feb-22 C 30  
mar-22 A 130  
mar-22 B 110  
mar-22 C 100  
apr-22 A 200  
apr-22 B 150  
apr-22 C 40  
maj-22 A 100  
maj-22 B 80  
maj-22 C 60  
jun-22 A 200  
jun-22 B 150  
jun-22 C 40  
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
okt-22 A 400 2000
okt-22 B 300 1080
okt-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480

 

To simplify for you when creating the sas table: 

data have;
input Period $ Segment $ Sum $ sum_rolling_6_mths $ 
;
datalines;
jan-22	 A	 100 .
jan-22	 B	 50 .
jan-22	 C	 40 .
feb-22	 A	 80 .
feb-22	 B	 60 .
feb-22	 C   30 . 
mar-22	 A	 130 .
mar-22	 B	 110 .
mar-22	 C   100 . 
apr-22	 A	 200 .
apr-22	 B	 150 .
apr-22	 C   100 .
maj-22	 A	 100 .
maj-22	 B	 80 .
maj-22	 C	 60 . 
jun-22	 A	 200 .
jun-22	 B	 150 .
jun-22	 C	 40 .
jul-22	 A	 500 1310
jul-22	 B	 100 700
jul-22	 C	 50 360
aug-22	 A	 400 1610
aug-22	 B	 200 850
aug-22	 C	 100 420
sep-22	 A	 200 1730
sep-22	 B	 100 890
sep-22	 C	 30 420
okt-22	 A	 400 2000
okt-22	 B	 300 1080
okt-22	 C	 200 520
nov-22	 A	 300 2100
nov-22	 B	 100 1030
nov-22	 C	 50 530
dec-22	 A   40 2040
dec-22	 B	 20 970
dec-22	 C   10 480
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

First, do not try to work with dates/months as character strings, because these cannot be sorted. (If you did sort them, then April would be the first month, and I'm guessing you don't want that). Work with numeric date values, as shown below in variable month. Then sort by segment and month, then compute the moving averages using PROC EXPAND (if you have a license for SAS/ETS)

 

data have;
input Period $ Segment $ Sum;
month=input(compress(period,'-'),monyy.);
format month monyy7.;
datalines;
jan-22	 A	 100 .
jan-22	 B	 50 .
jan-22	 C	 40 .
feb-22	 A	 80 .
feb-22	 B	 60 .
feb-22	 C   30 . 
mar-22	 A	 130 .
mar-22	 B	 110 .
mar-22	 C   100 . 
apr-22	 A	 200 .
apr-22	 B	 150 .
apr-22	 C   100 .
may-22	 A	 100 .
may-22	 B	 80 .
may-22	 C	 60 . 
jun-22	 A	 200 .
jun-22	 B	 150 .
jun-22	 C	 40 .
jul-22	 A	 500 1310
jul-22	 B	 100 700
jul-22	 C	 50 360
aug-22	 A	 400 1610
aug-22	 B	 200 850
aug-22	 C	 100 420
sep-22	 A	 200 1730
sep-22	 B	 100 890
sep-22	 C	 30 420
oct-22	 A	 400 2000
oct-22	 B	 300 1080
oct-22	 C	 200 520
nov-22	 A	 300 2100
nov-22	 B	 100 1030
nov-22	 C	 50 530
dec-22	 A   40 2040
dec-22	 B	 20 970
dec-22	 C   10 480
;

proc sort data=have;
    by segment month;
run;
proc expand data=have out=want;
    by segment;
    convert sum=sum_rolling_6_mths / transformout=(movsum 6 trimleft 5);
run;
--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

First, do not try to work with dates/months as character strings, because these cannot be sorted. (If you did sort them, then April would be the first month, and I'm guessing you don't want that). Work with numeric date values, as shown below in variable month. Then sort by segment and month, then compute the moving averages using PROC EXPAND (if you have a license for SAS/ETS)

 

data have;
input Period $ Segment $ Sum;
month=input(compress(period,'-'),monyy.);
format month monyy7.;
datalines;
jan-22	 A	 100 .
jan-22	 B	 50 .
jan-22	 C	 40 .
feb-22	 A	 80 .
feb-22	 B	 60 .
feb-22	 C   30 . 
mar-22	 A	 130 .
mar-22	 B	 110 .
mar-22	 C   100 . 
apr-22	 A	 200 .
apr-22	 B	 150 .
apr-22	 C   100 .
may-22	 A	 100 .
may-22	 B	 80 .
may-22	 C	 60 . 
jun-22	 A	 200 .
jun-22	 B	 150 .
jun-22	 C	 40 .
jul-22	 A	 500 1310
jul-22	 B	 100 700
jul-22	 C	 50 360
aug-22	 A	 400 1610
aug-22	 B	 200 850
aug-22	 C	 100 420
sep-22	 A	 200 1730
sep-22	 B	 100 890
sep-22	 C	 30 420
oct-22	 A	 400 2000
oct-22	 B	 300 1080
oct-22	 C	 200 520
nov-22	 A	 300 2100
nov-22	 B	 100 1030
nov-22	 C	 50 530
dec-22	 A   40 2040
dec-22	 B	 20 970
dec-22	 C   10 480
;

proc sort data=have;
    by segment month;
run;
proc expand data=have out=want;
    by segment;
    convert sum=sum_rolling_6_mths / transformout=(movsum 6 trimleft 5);
run;
--
Paige Miller
SasStatistics
Pyrite | Level 9
Unfortunately do not have the license, can I remove this as the accepted solution? (Accepted it by misstake).

Do you know any other way?
Reeza
Super User
data have;
set have;
period_sas_date = input(period, anydtdte.);
format period_sas_date monyy5.;
run;


proc sort data=have;
by segment period_sas_date;
run;

data want;
array p{0:5} _temporary_;
set have; by segment;
if first.segment then do; call missing(of p{*}); counter=0;
end;
counter+1;
p{mod(_n_,6)} = sum;
if counter >5 then rolling_sum = sum(of p{*});

run;

proc sort data=want;
by period_sas_date segment;
run;

You can use temporary arrays instead - you need to make sure you have a license for proc expand. 

 

Are you sure the numbers shown are correct? They seem to be 7 months moving sum?

 

https://communities.sas.com/message/244232

 

SasStatistics
Pyrite | Level 9
Exactly this is what I was looking for. I made an minimalistic example but I have modofied your code and use that. Thanks alot.

One question, what is the purpose of this code: p{mod(_n_,6)} = sum; ?
Reeza
Super User
It assigns the value of the observation to the temporary array. The position in the array is determined by the row (_n_) and size of array (6). This allows each value to be uniquely placed in the array and overwritten once more than 6 are encountered.
PaigeMiller
Diamond | Level 26

From now on, @SasStatistics you can leave this part out

 

PaigeMiller_0-1672851321106.png

 

and just provide the SAS code that contains the data. We don't use screen captures of data, we do use SAS data step code.

 

As far as not having PROC EXPAND, searching the forums for moving average or moving sum should turn up lots of ideas, such as this: https://communities.sas.com/t5/SAS-Programming/Computing-moving-average-by-group-and-year/m-p/824977

--
Paige Miller
ballardw
Super User

Does each "month" always have exactly 3 observations?

Why if you want to "sum" a value are you making it character in your input statement?????

How sure are you that 1310 is correct for the first Jul-22? Your data step has

apr-22	 C   100 .

but your "want" table has which is a different value.

apr-22 C 40  

When I add the data step values for Jan through Jun I get 1780, not 1310.

 

IF there are exactly 3 records for each month the following works WHEN SUM is numeric without needing any other modules. If you actually have date values then perhaps Proc Expand would do this as well.

data want;
   set have;
   rollingsum = lag1(sum)+lag2(sum)+lag3(sum)+
                lag4(sum)+lag5(sum)+lag6(sum)+
                lag7(sum)+lag8(sum)+lag9(sum)+
                lag10(sum)+lag11(sum)+lag12(sum)+
                lag13(sum)+lag14(sum)+lag15(sum)+
                lag16(sum)+lag17(sum)+lag18(sum)
   ;
run;

This actually takes advantage of the + operator returning missing values when any are missing. The Lag function will return missing if that number of previous record does not have a value.

Caution: This assumes that the Sum variable is not ever missing. If you have missing values then you need to provide such an example.

 


@SasStatistics wrote:

I have a dataset that contains the columns: Period, Segment and Sum. 

Period Segment Sum
jan-22 A 100
jan-22 B 50
jan-22 C 40
feb-22 A 80
feb-22 B 60
feb-22 C 30
mar-22 A 130
mar-22 B 110
mar-22 C 100
apr-22 A 200
apr-22 B 150
apr-22 C 40
maj-22 A 100
maj-22 B 80
maj-22 C 60
jun-22 A 200
jun-22 B 150
jun-22 C 40
jul-22 A 500
jul-22 B 100
jul-22 C 50
aug-22 A 400
aug-22 B 200
aug-22 C 100
sep-22 A 200
sep-22 B 100
sep-22 C 30
okt-22 A 400
okt-22 B 300
okt-22 C 200
nov-22 A 300
nov-22 B 100
nov-22 C 50
dec-22 A 40
dec-22 B 20
dec-22 C 10


Now I would like to create a new column that is the moving/rolling sum for each segment over the previous 6 periods. 

In total this would give me the following table: 

Period Segment Sum summary_rolling_6months_per_segment
jan-22 A 100  
jan-22 B 50  
jan-22 C 40  
feb-22 A 80  
feb-22 B 60  
feb-22 C 30  
mar-22 A 130  
mar-22 B 110  
mar-22 C 100  
apr-22 A 200  
apr-22 B 150  
apr-22 C 40  
maj-22 A 100  
maj-22 B 80  
maj-22 C 60  
jun-22 A 200  
jun-22 B 150  
jun-22 C 40  
jul-22 A 500 1310
jul-22 B 100 700
jul-22 C 50 360
aug-22 A 400 1610
aug-22 B 200 850
aug-22 C 100 420
sep-22 A 200 1730
sep-22 B 100 890
sep-22 C 30 420
okt-22 A 400 2000
okt-22 B 300 1080
okt-22 C 200 520
nov-22 A 300 2100
nov-22 B 100 1030
nov-22 C 50 530
dec-22 A 40 2040
dec-22 B 20 970
dec-22 C 10 480

 

To simplify for you when creating the sas table: 

data have;
input Period $ Segment $ Sum $ sum_rolling_6_mths $ 
;
datalines;
jan-22	 A	 100 .
jan-22	 B	 50 .
jan-22	 C	 40 .
feb-22	 A	 80 .
feb-22	 B	 60 .
feb-22	 C   30 . 
mar-22	 A	 130 .
mar-22	 B	 110 .
mar-22	 C   100 . 
apr-22	 A	 200 .
apr-22	 B	 150 .
apr-22	 C   100 .
maj-22	 A	 100 .
maj-22	 B	 80 .
maj-22	 C	 60 . 
jun-22	 A	 200 .
jun-22	 B	 150 .
jun-22	 C	 40 .
jul-22	 A	 500 1310
jul-22	 B	 100 700
jul-22	 C	 50 360
aug-22	 A	 400 1610
aug-22	 B	 200 850
aug-22	 C	 100 420
sep-22	 A	 200 1730
sep-22	 B	 100 890
sep-22	 C	 30 420
okt-22	 A	 400 2000
okt-22	 B	 300 1080
okt-22	 C	 200 520
nov-22	 A	 300 2100
nov-22	 B	 100 1030
nov-22	 C	 50 530
dec-22	 A   40 2040
dec-22	 B	 20 970
dec-22	 C   10 480
;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2608 views
  • 8 likes
  • 4 in conversation