Solved: Re: How to compute a moving average within a by group?

pecon1 · Posted 11-20-2012 09:41 AM

I am trying to compute quarterly moving averages within a by group; however, there are different numbers of observations within each group. I have seen some examples, but they assume the same number of observations need to be averaged throughout. Here is an example:

Items a and d are purchased in each quarter, but item b is only purchased in the second and third quarters, while item c is only purchased in quarter 4.

So, the by group would be "by item quarter"

item quarter quantity

a 1 2

a 2 1

a 3 4

a 4 3

b 2 25

b 3 4

c 4 15

d 1 2

d 2 1

d 3 4

d 4 3

The example above is sort of a watered down version. I am using time series data and even though an item is purchased every quarter in one year, it may not be purchased every quarter in another year, or it may not be purchased at all for a particular year.

Is there a simple way to compute moving averages for this type of data without interpolation? If possible, can you also include an example with interpolation?

Any help would be appreciated.

Thanks!

cau83 · Posted 11-28-2012 10:06 AM

create a data set with observations that have your start and end date. We have the same problem and so we make a data set that has a record where date = today(), then append that to our data, then do proc expand and it will expand out to that (and if you put a start date, it will start there and zero-fill until your first true observation).

View solution in original post

LinusH · Posted 11-20-2012 09:53 AM

One "easy" way to do it is to create a multilabel format that groups your moving quarters.

Then use that format together with PROC MEANS, using the MEAN statistic.

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n13dhme6o4ut3en1u8e58...

Data never sleeps

PaigeMiller · Posted 11-20-2012 01:01 PM

LinusH wrote:

One "easy" way to do it is to create a multilabel format that groups your moving quarters.

Then use that format together with PROC MEANS, using the MEAN statistic.

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n13dhme6o4ut3en1u8e58...

Unless I am missing something, this seems like an awfully work-intensive way to get moving averages, despite your use of the word "easy". Don't you have to pre-define the quarters as overlapping formats in a PROC FORMAT to make this work? Which implies in advance that you have to know exactly what quarters are in the database, and code that up into PROC FORMAT. If you have 50 quarters, that's a lot of PROC FORMAT coding, and the slightest error in there invalidates the results.

Am I right, or did I miss something?

--
Paige Miller

PaigeMiller · Posted 11-20-2012 09:55 AM

I'm not going to write code for you, but your table needs to be augmented with rows that contain 0 for the quarters where the quantity is zero. Otherwise, you will never get the proper moving averages.

PROC MACONTROL will do the calculations for you, even allows for BY groups, but I think first you must augment the table with zero values. If you don't want the control charts that PROC MACONTROL produces, you can ignore/eliminate those and use the output data set (OUTHISTORY=) which contains your moving averages.

--
Paige Miller

Haikuo · Posted 11-20-2012 01:30 PM

Unless you have SAS/ETS, you can't talk about 'easy'. First, using proc timeseries to fix data (like Paigemiller suggested), and then using proc expand to calculate the moving average. In case you don't really care those missing qtrs, then you may deploy proc expand directly.

however, if you don't have ETS, then you probably need some array(). I have never used proc macontrol before, so can't comment on that.

Hakuo

PaigeMiller · Posted 11-20-2012 01:35 PM

Actually, my suggestion to expand the dataset so it contains appropriate rows with 0, can be done via a combination of DATA steps and PROC SQL. The computation of the moving average could be done via LAG commands in a DATA STEP, but it is easier to use PROC MACONTROL (if you have SAS/QC)

--
Paige Miller

Haikuo · Posted 11-20-2012 01:51 PM

I know what you mean. proc timeseries is exactly capable of doing just that.

Haikuo

PGStats · Posted 11-20-2012 01:53 PM

Seems fairly simple with SQL :

data have;
input item $ quarter quantity;
year = 2000;
datalines;
a          1               2
a          2               1
a          3               4
a          4               3
b          2               25
b          3               4
c          4               15
d          1               2
d          2               1
d          3               4
d          4               3
;

proc sql;
create table comb as
select I.item, Q.year, Q.quarter
from
     (select distinct item from have) as I
     cross join
     (select distinct year, quarter from have) as Q
order by item, year, quarter;
create table want as
select C.item, C.year, C.quarter, coalesce(sum(quantity), 0) as totalQuantity
from comb as C left join have as H
     on C.item=H.item and C.year=H.year and C.quarter=H.quarter
group by C.item, C.year, C.quarter;
drop table comb;
quit;

PG

PaigeMiller · Posted 11-21-2012 10:18 AM

Nice job with the SQL code, PGStats!

However, "fairly simple" to you is not fairly simple to the many of us.

--
Paige Miller

pecon1 · Posted 11-27-2012 09:42 AM

I have tried to expand the data to include missing values so that the moving averages may be computed as paigemiller has suggested. I am working with SAS/ETS by the way. I am using monthly data and the date variable is formatted as monyy. Here is the code that I am using to expand the data:

proc sort data=fisher3; by class item date; run;

proc expand data=fisher3 out=fshrexp from=month method=none;

by class item; *do not include date in by group here or it won't work;

id date;

run;

Question: I thought this solved my missing values problem; however when there are missing values at the beginning/end of the series (ex. values for a variable start at March instead of January and may end at October instead of December), these values remain missing (do not show up in the expanded data set) even though the rest of the series is expanded. Is there a way, or an option in proc expand, to get SAS to indicate that these values are missing as well?

Also, here is the basic code that I am using to compute moving averages. Although I need to correct the missing values problem, this code works quite nicely. You can convert multiple vars at once and use multiple convert statements to create different moving averages at once for those vars.

proc sort data=old; by class item date; run;

proc expand data=old out=new;

by class item;

convert var1=movingaverage / method=none transformout=(movave 3);

run;

cau83 · Posted 11-28-2012 10:06 AM

create a data set with observations that have your start and end date. We have the same problem and so we make a data set that has a record where date = today(), then append that to our data, then do proc expand and it will expand out to that (and if you put a start date, it will start there and zero-fill until your first true observation).

pecon1 · Posted 12-06-2012 09:58 AM

Thanks! I had not thought about doing that, and it did solve the problem. I created a separate dataset with beginning and ending series values, merged the new and old data sets, then used proc expand to generate the remaining missing values.

oloolo · Posted 11-29-2012 10:47 AM

PROC FREQ with SPARSE option can do the expanding for you even if you don't license ETS module. Here is a soluation using SAS/BASE only. Calculating moving average is a straight forward programming task.

proc freq data=test noprint;

table item*quarter /sparse out=test2(rename=(count=quantity) drop=percent);

weight quantity;

run;

%let k=2; /* length of your window */

data test_ma;

set test2; by item;

array _x{0:%eval(&k-1)} _temporary_;

if first.item then do;

_x[0]=quantity;

do i=1 to (dim(_x)-1); _x=0; end;

seq=1;

end;

else do;

_x[mod(seq, &k)]=quantity; seq+1;

end;

if seq>=&k then do;

ma=mean(of _x

);

if seq>&k then do;

_x[mod(seq, &k)]=quantity;

end;

drop i seq;

run;

pecon1 · Posted 12-13-2012 10:42 AM

Now another question has come up: some of my data are purposefully excluded for entire years. I am using the proc expand convert option to compute the moving averages as indicated previously, but how do I get the procedure to start over when years are not consecutive?

I thought about creating a counter variable but I can't do an if statement in proc expand. Here is a brief example of what I mean.

item year month value

a 2000 1 #

... ... .. ...

a 2000 12 #
a 2007 1 # *Here is the problem: item is only purchased for these two years but I need moving average to start over here.

... ... .. ...

a 2007 12 #

Any suggestions would be greatly appreciated. I'll keep working on figuring this out as well.

cau83 · Posted 12-13-2012 11:11 AM

This might not be practical if you have many different series of data with different years, but, you could include a where statement that says to do the expand where year = 2000 or year = 2007. Depending on which moving average method you're using, if it's in a data step, then have an if that says if year=2007 and month=1 then start the moving average over, or something to that effect.

But if different series have different years, you would either have to start running them separately ( a pain) or perhaps you could build a data set that you merge in that provides that info (one column would indicate the item, another would have the year(s) that are relevant with an additional record for each year).

Then do proc expand/Moving Avg on each, then append those data sets. But again, that would be no fun if you have different series with different years/ranges of years.

Registration is open

SAS Training: Just a Click Away