BookmarkSubscribeRSS Feed
paul1034
Calcite | Level 5

Hi,

I am somewhat new at using and writing SAS code, and I'm trying to create a moving average calculation.  I've found examples and based on what I have read and tried, I can't seem to get my data to produce the desired results.  Here is an example of my data.

Year     Month     Sales     Running Sum

2011     Jan          100          100

2011     Feb          250          350

2011     March      200          550

2012     Jan          175          175

2012     Feb          300          475

2012     March      225          700

2013     Jan          150          150

2013     Feb          275          425

2013     March      250          675

I have Year, Month, Sales, and a Running Sum of the monthly sales for each year.  I would like to be able to compare the Running Sum of sales through the current month and compare it to the average of the past years through that same month.  I've tried using PROC EXPAND, and that seems to work, but only when my dataset has Year and Sales.

Thanks for your help.  It is much appreciated.

11 REPLIES 11
Reeza
Super User

paul1034 wrote:

  I've found examples and based on what I have read and tried, I can't seem to get my data to produce the desired results. 


I've tried using PROC EXPAND, and that seems to work, but only when my dataset has Year and Sales.

1. What is your desired output? This is a YTD year over year comparison?

2. Why can't you include years and sales in PROC EXPAND? I think any solution will require those variables, because you're calculating across years so SAS needs to know the years and it uses the Sales raw data not the running sum to do the calculations.

paul1034
Calcite | Level 5

Reeza,

I apologize for the confusion.  Here is what I am trying to calculate.

Year     Month     Sales     Running Sum     Average

2011     Jan          100          100                    .

2011     Feb          250          350                    .

2011     March      200          550                    .

2012     Jan          175          175                  100

2012     Feb          300          475                  350

2012     March      225          700                  550

2013     Jan          150          150                  138  ((175+100) / 2)

2013     Feb          275          425                  413  ((475+350) / 2)

2013     March      250          675                  625  ((700+550) / 2)

The Average column calculates the Running Sum through each month and divides that figure by the number of prior periods.  See the calculations in parentheses.  I'm actually trying to calculate a four year average, but I know I did not include enough data in my sample to calculate the full four years.

I am able to use PROC EXPAND to calculate a moving average, but I only get the code to work when I had the Year and Sales (at cumulative total) columns in my dataset.  Sales were for the full year, not broken out by month.  For 2011, sales were 550.  For 2012, sales were 700.

So, how do I calculate the moving average with more than two columns in my table?

I hope this explanation helps answer your questions.  Thanks again.

KachiM
Rhodochrosite | Level 12

I presume that for the moving average, you want a window size of 3.

The 2 used  for average is, I suppose, is one less than the window size.

What is the big picture?

By a data step, Moving averages can be found provided you have no fear to use Array.

Try to explain your bigger problem. Someone here would help you.

Reeza
Super User

i'm not following. What does your data look like and what do you want the output to look like.

paul1034
Calcite | Level 5

My table consists of Year, Month, Sales, and Running Sum (of Sales).  I want to calculate the moving average of the Running Sum (of Sales).

From my sample data provided, I know the Running Sum (of Sales) through Jan 2013 is 150, Feb 2013 is 425, and March 2013 is 675.  How do I write the code in EG to calculate the moving average to get 138 for Jan, 413 for Feb, and 625 for March?

From the research I have done, it appears I should use PROC EXPAND to write the code.  If there is another method I should use, please explain.  Down the road, I may need to include additional columns to my data and perform the calculations at that level of detail.

billfish
Quartz | Level 8

A solution (amongst others) using plain DATA STEP. I added years 2014 and 2015 to see if I understood the question.


data t_have;
  input year month $ sales;
cards;
2011     Jan     100
2011     Feb     250
2011     Mar     200
2012     Jan     175
2012     Feb     300
2012     Mar     225
2013     Jan     150
2013     Feb     275
2013     Mar     250
2014     Jan     125
2014     Feb     200
2014     Mar     175
2015     Jan     105
2015     Feb     210
2015     Mar     275
;


data t_want(keep=year month sales R_sum R_avg);
  set t_have;
  length R_sum R_avg 8.;
  retain R_sum R_avg S_sum S_i C_row;
  array zMth(3);
  by year;
 
  if first.year then do;
     C_row=0;
     S_i+1;
     R_sum=0;    
  end;
     C_row+1;
     R_sum+Sales;
     if (S_i > 1) then R_avg = round(zMth(C_row)/(S_i-1),0.01); else R_avg=.;
     zMth(C_row)+R_sum;

run;

proc print data=t_want; run;

table t_want is:
==============================================================================
Obs    year    month    sales    R_sum     R_avg

  1    2011     Jan      100      100        .
  2    2011     Feb      250      350        .
  3    2011     Mar      200      550        .

  4    2012     Jan      175      175     100.00
  5    2012     Feb      300      475     350.00
  6    2012     Mar      225      700     550.00

  7    2013     Jan      150      150     137.50
  8    2013     Feb      275      425     412.50
  9    2013     Mar      250      675     625.00

10    2014     Jan      125      125     141.67 (100+175+150)/3
11    2014     Feb      200      325     416.67 (350+475+425)/3
12    2014     Mar      175      500     641.67 (550+700+675)/3

13    2015     Jan      105      105     137.50 (100+175+150+125)/4
14    2015     Feb      210      315     393.75 (350+475+425+325)/4
15    2015     Mar      275      590     606.25 (550+700+675+500)/4
==============================================================================

paul1034
Calcite | Level 5

I think you got it!  Thank you!!!  I'll let you know once I try it with my actual data.  Thanks again!

KachiM
Rhodochrosite | Level 12

You have got an Array solution. In the following statement:

"Down the road, I may need to include additional columns to my data and perform the calculations at that level of detail."


do you mean additional months or years or both? Or any other variable?


In the solution by BILLFISH, the denominators vary.

For the year 2014, it is 3 and for 2015 it is 4. The rationale is that there are that many non-missing values. But I remember for Moving Averages, a fixed window-size is used. What is your view on this?

Your answers and clarification can find a suitable solution to meet your future needs.

paul1034
Calcite | Level 5

Datasp,

When I made the statement to include additional columns, I meant I may need to add other variables to my data, not just including more months and years.

After I tested with the solution provided by BILLFISH, I realized that, as you said, the denominators vary.  I need a fixed window-size, such as only looking back at the previous four years from the current year or year in question.  So, 2015 needs to be compared to the average of 2014 to 2011, 2014 needs to be compared to 2013 to 2010, and so on.

Yes, BILLFISH gave a solution to what I asked.  It is my fault that I didn't clarify enough in writing that I want a moving average calculation, but only have the average calculate for the prior four years from a given year.  I'm hoping based on the solutions I have been given that a line or two of code can be added or modified for them to work in the manner I intended.

Thank you for following up and being patient with me on this.

KachiM
Rhodochrosite | Level 12

You have 3 months(Jan to Mar). It is logical to choose 3 as the size rather than 4. If it is 4 why not 5 or any other number? Think.

Ksharp
Super User

I am a little surprise . Since you can do it via proc expand , why would you need data step ?


data have;
  input year month $ sales RunningSum;
cards;
2011     Jan     100 100
2011     Feb     250 350
2011     Mar     200 550
2012     Jan     175 175
2012     Feb     300 475
2012     Mar     225 700
2013     Jan     150 150
2013     Feb     275 425
2013     Mar     250 675
;
run;
proc sort data=have out=temp;by month year;run;
data temp;
 set temp;
 by month;
 if first.month then do;n=0; sum_sales=0;end;
 R_avg=divide(sum_sales,n);
 n+1; sum_sales+RunningSum;
 drop n  sum_sales;
 run;
proc sort data=temp out=want;by  year;run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 4034 views
  • 0 likes
  • 5 in conversation