02-10-2015 02:11 PM
I am somewhat new at using and writing SAS code, and I'm trying to create a moving average calculation. I've found examples and based on what I have read and tried, I can't seem to get my data to produce the desired results. Here is an example of my data.
Year Month Sales Running Sum
2011 Jan 100 100
2011 Feb 250 350
2011 March 200 550
2012 Jan 175 175
2012 Feb 300 475
2012 March 225 700
2013 Jan 150 150
2013 Feb 275 425
2013 March 250 675
I have Year, Month, Sales, and a Running Sum of the monthly sales for each year. I would like to be able to compare the Running Sum of sales through the current month and compare it to the average of the past years through that same month. I've tried using PROC EXPAND, and that seems to work, but only when my dataset has Year and Sales.
Thanks for your help. It is much appreciated.
02-10-2015 02:51 PM
I've found examples and based on what I have read and tried, I can't seem to get my data to produce the desired results.
I've tried using PROC EXPAND, and that seems to work, but only when my dataset has Year and Sales.
1. What is your desired output? This is a YTD year over year comparison?
2. Why can't you include years and sales in PROC EXPAND? I think any solution will require those variables, because you're calculating across years so SAS needs to know the years and it uses the Sales raw data not the running sum to do the calculations.
02-10-2015 04:28 PM
I apologize for the confusion. Here is what I am trying to calculate.
Year Month Sales Running Sum Average
2011 Jan 100 100 .
2011 Feb 250 350 .
2011 March 200 550 .
2012 Jan 175 175 100
2012 Feb 300 475 350
2012 March 225 700 550
2013 Jan 150 150 138 ((175+100) / 2)
2013 Feb 275 425 413 ((475+350) / 2)
2013 March 250 675 625 ((700+550) / 2)
The Average column calculates the Running Sum through each month and divides that figure by the number of prior periods. See the calculations in parentheses. I'm actually trying to calculate a four year average, but I know I did not include enough data in my sample to calculate the full four years.
I am able to use PROC EXPAND to calculate a moving average, but I only get the code to work when I had the Year and Sales (at cumulative total) columns in my dataset. Sales were for the full year, not broken out by month. For 2011, sales were 550. For 2012, sales were 700.
So, how do I calculate the moving average with more than two columns in my table?
I hope this explanation helps answer your questions. Thanks again.
02-11-2015 09:41 AM
I presume that for the moving average, you want a window size of 3.
The 2 used for average is, I suppose, is one less than the window size.
What is the big picture?
By a data step, Moving averages can be found provided you have no fear to use Array.
Try to explain your bigger problem. Someone here would help you.
02-11-2015 12:26 PM
My table consists of Year, Month, Sales, and Running Sum (of Sales). I want to calculate the moving average of the Running Sum (of Sales).
From my sample data provided, I know the Running Sum (of Sales) through Jan 2013 is 150, Feb 2013 is 425, and March 2013 is 675. How do I write the code in EG to calculate the moving average to get 138 for Jan, 413 for Feb, and 625 for March?
From the research I have done, it appears I should use PROC EXPAND to write the code. If there is another method I should use, please explain. Down the road, I may need to include additional columns to my data and perform the calculations at that level of detail.
02-11-2015 02:02 PM
A solution (amongst others) using plain DATA STEP. I added years 2014 and 2015 to see if I understood the question.
input year month $ sales;
2011 Jan 100
2011 Feb 250
2011 Mar 200
2012 Jan 175
2012 Feb 300
2012 Mar 225
2013 Jan 150
2013 Feb 275
2013 Mar 250
2014 Jan 125
2014 Feb 200
2014 Mar 175
2015 Jan 105
2015 Feb 210
2015 Mar 275
data t_want(keep=year month sales R_sum R_avg);
length R_sum R_avg 8.;
retain R_sum R_avg S_sum S_i C_row;
if first.year then do;
if (S_i > 1) then R_avg = round(zMth(C_row)/(S_i-1),0.01); else R_avg=.;
proc print data=t_want; run;
table t_want is:
Obs year month sales R_sum R_avg
1 2011 Jan 100 100 .
2 2011 Feb 250 350 .
3 2011 Mar 200 550 .
4 2012 Jan 175 175 100.00
5 2012 Feb 300 475 350.00
6 2012 Mar 225 700 550.00
7 2013 Jan 150 150 137.50
8 2013 Feb 275 425 412.50
9 2013 Mar 250 675 625.00
10 2014 Jan 125 125 141.67 (100+175+150)/3
11 2014 Feb 200 325 416.67 (350+475+425)/3
12 2014 Mar 175 500 641.67 (550+700+675)/3
13 2015 Jan 105 105 137.50 (100+175+150+125)/4
14 2015 Feb 210 315 393.75 (350+475+425+325)/4
15 2015 Mar 275 590 606.25 (550+700+675+500)/4
02-12-2015 04:17 AM
You have got an Array solution. In the following statement:
"Down the road, I may need to include additional columns to my data and perform the calculations at that level of detail."
do you mean additional months or years or both? Or any other variable?
In the solution by BILLFISH, the denominators vary.
For the year 2014, it is 3 and for 2015 it is 4. The rationale is that there are that many non-missing values. But I remember for Moving Averages, a fixed window-size is used. What is your view on this?
Your answers and clarification can find a suitable solution to meet your future needs.
02-12-2015 09:52 AM
When I made the statement to include additional columns, I meant I may need to add other variables to my data, not just including more months and years.
After I tested with the solution provided by BILLFISH, I realized that, as you said, the denominators vary. I need a fixed window-size, such as only looking back at the previous four years from the current year or year in question. So, 2015 needs to be compared to the average of 2014 to 2011, 2014 needs to be compared to 2013 to 2010, and so on.
Yes, BILLFISH gave a solution to what I asked. It is my fault that I didn't clarify enough in writing that I want a moving average calculation, but only have the average calculate for the prior four years from a given year. I'm hoping based on the solutions I have been given that a line or two of code can be added or modified for them to work in the manner I intended.
Thank you for following up and being patient with me on this.
02-12-2015 05:09 AM
I am a little surprise . Since you can do it via proc expand , why would you need data step ?
data have; input year month $ sales RunningSum; cards; 2011 Jan 100 100 2011 Feb 250 350 2011 Mar 200 550 2012 Jan 175 175 2012 Feb 300 475 2012 Mar 225 700 2013 Jan 150 150 2013 Feb 275 425 2013 Mar 250 675 ; run; proc sort data=have out=temp;by month year;run; data temp; set temp; by month; if first.month then do;n=0; sum_sales=0;end; R_avg=divide(sum_sales,n); n+1; sum_sales+RunningSum; drop n sum_sales; run; proc sort data=temp out=want;by year;run;