BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pecon1
Fluorite | Level 6

I am trying to compute quarterly moving averages within a by group; however, there are different numbers of observations within each group. I have seen some examples, but they assume the same number of observations need to be averaged throughout. Here is an example:

Items a and d are purchased in each quarter, but item b is only purchased in the second and third quarters, while item c is only purchased in quarter 4.

So, the by group would be "by item quarter"

item     quarter     quantity

a          1               2

a          2               1

a          3               4

a          4               3

b          2               25

b          3               4

c          4               15

d          1               2

d          2               1

d          3               4

d          4               3

The example above is sort of a watered down version. I am using time series data and even though an item is purchased every quarter in one year, it may not be purchased every quarter in another year, or it may not be purchased at all for a particular year.

Is there a simple way to compute moving averages for this type of data without interpolation? If possible, can you also include an example with interpolation?

Any help would be appreciated.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
cau83
Pyrite | Level 9

create a data set with observations that have your start and end date. We have the same problem and so we make a data set that has a record where date = today(), then append that to our data, then do proc expand and it will expand out to that (and if you put a start date, it will start there and zero-fill until your first true observation).

View solution in original post

14 REPLIES 14
LinusH
Tourmaline | Level 20

One "easy" way to do it is to create a multilabel format that groups your moving quarters.

Then use that format together with PROC MEANS, using the MEAN statistic.

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n13dhme6o4ut3en1u8e58...

Data never sleeps
PaigeMiller
Diamond | Level 26

LinusH wrote:

One "easy" way to do it is to create a multilabel format that groups your moving quarters.

Then use that format together with PROC MEANS, using the MEAN statistic.

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n13dhme6o4ut3en1u8e58...

Unless I am missing something, this seems like an awfully work-intensive way to get moving averages, despite your use of the word "easy". Don't you have to pre-define the quarters as overlapping formats in a PROC FORMAT to make this work? Which implies in advance that you have to know exactly what quarters are in the database, and code that up into PROC FORMAT. If you have 50 quarters, that's a lot of PROC FORMAT coding, and the slightest error in there invalidates the results.

Am I right, or did I miss something?

--
Paige Miller
PaigeMiller
Diamond | Level 26

I'm not going to write code for you, but your table needs to be augmented with rows that contain 0 for the quarters where the quantity is zero. Otherwise, you will never get the proper moving averages.

PROC MACONTROL will do the calculations for you, even allows for BY groups, but I think first you must augment the table with zero values. If you don't want the control charts that PROC MACONTROL produces, you can ignore/eliminate those and use the output data set (OUTHISTORY=) which contains your moving averages.

--
Paige Miller
Haikuo
Onyx | Level 15

Unless you have SAS/ETS, you can't talk about 'easy'. First, using proc timeseries to fix data (like Paigemiller suggested), and then using proc expand to calculate the moving average. In case you don't really care those missing qtrs, then you may deploy proc expand directly.

however, if you don't have ETS, then you probably need some array(). I have never used proc macontrol before, so can't comment on that.

Hakuo

PaigeMiller
Diamond | Level 26

Actually, my suggestion to expand the dataset so it contains appropriate rows with 0, can be done via a combination of DATA steps and PROC SQL. The computation of the moving average could be done via LAG commands in a DATA STEP, but it is easier to use PROC MACONTROL (if you have SAS/QC)

--
Paige Miller
Haikuo
Onyx | Level 15

I know what you mean. proc timeseries is exactly capable of doing just that.

Haikuo

PGStats
Opal | Level 21

Seems fairly simple with SQL :

data have;
input item $ quarter quantity;
year = 2000;
datalines;
a          1               2
a          2               1
a          3               4
a          4               3
b          2               25
b          3               4
c          4               15
d          1               2
d          2               1
d          3               4
d          4               3
;

proc sql;
create table comb as
select I.item, Q.year, Q.quarter
from
     (select distinct item from have) as I
     cross join
     (select distinct year, quarter from have) as Q
order by item, year, quarter;
create table want as
select C.item, C.year, C.quarter, coalesce(sum(quantity), 0) as totalQuantity
from comb as C left join have as H
     on C.item=H.item and C.year=H.year and C.quarter=H.quarter
group by C.item, C.year, C.quarter;
drop table comb;
quit;

PG

PG
PaigeMiller
Diamond | Level 26

Nice job with the SQL code, PGStats!

However, "fairly simple" to you is not fairly simple to the many of us. Smiley Wink

--
Paige Miller
pecon1
Fluorite | Level 6

I have tried to expand the data to include missing values so that the moving averages may be computed as paigemiller has suggested. I am working with SAS/ETS by the way. I am using monthly data and the date variable is formatted as monyy. Here is the code that I am using to expand the data:

proc sort data=fisher3; by class item date; run;

proc expand data=fisher3 out=fshrexp from=month method=none;

by class item; *do not include date in by group here or it won't work;

id date;

run;

Question: I thought this solved my missing values problem; however when there are missing values at the beginning/end of the series (ex. values for a variable start at March instead of January and may end at October instead of December), these values remain missing (do not show up in the expanded data set) even though the rest of the series is expanded. Is there a way, or an option in proc expand, to get SAS to indicate that these values are missing as well?

Also, here is the basic code that I am using to compute moving averages. Although I need to correct the missing values problem, this code works quite nicely. You can convert multiple vars at once and use multiple convert statements to create different moving averages at once for those vars.

proc sort data=old; by class item date; run;

proc expand data=old out=new;

     by class item;

     convert var1=movingaverage / method=none transformout=(movave 3);

run;

cau83
Pyrite | Level 9

create a data set with observations that have your start and end date. We have the same problem and so we make a data set that has a record where date = today(), then append that to our data, then do proc expand and it will expand out to that (and if you put a start date, it will start there and zero-fill until your first true observation).

pecon1
Fluorite | Level 6

Thanks! I had not thought about doing that, and it did solve the problem. I created a separate dataset with beginning and ending series values, merged the new and old data sets, then used proc expand to generate the remaining missing values.

oloolo
Fluorite | Level 6

PROC FREQ with SPARSE option can do the expanding for you even if you don't license ETS module. Here is a soluation using SAS/BASE only. Calculating moving average is a straight forward programming task.

proc freq data=test  noprint;

         table item*quarter /sparse  out=test2(rename=(count=quantity)  drop=percent);

        weight quantity;

run;

%let k=2; /* length of your window */

data test_ma;

        set test2; by item;

        array _x{0:%eval(&k-1)} _temporary_;

        if first.item then do;

           _x[0]=quantity;

          do i=1 to (dim(_x)-1); _x=0; end;

          seq=1;

        end;

        else do;

             _x[mod(seq, &k)]=quantity; seq+1;

        end;

        if seq>=&k then do;

           ma=mean(of _x

  • );
  •            if seq>&k then do;

                   _x[mod(seq, &k)]=quantity;

               end;

             end;

              drop i seq;

    run;


    pecon1
    Fluorite | Level 6

    Now another question has come up: some of my data are purposefully excluded for entire years. I am using the proc expand convert option to compute the moving averages as indicated previously, but how do I get the procedure to start over when years are not consecutive?

    I thought about creating a counter variable but I can't do an if statement in proc expand. Here is a brief example of what I mean.

    item     year     month     value

    a          2000     1               #

    ...          ...          ..          ...

    a          2000     12             #
    a          2007      1             #          *Here is the problem: item is only purchased for these two years but I need moving average to start over here.

    ...          ...          ..          ...

    a          2007      12           #        

    Any suggestions would be greatly appreciated. I'll keep working on figuring this out as well.

    cau83
    Pyrite | Level 9

    This might not be practical if you have many different series of data with different years, but, you could include a where statement that says to do the expand where year = 2000 or year = 2007. Depending on which moving average method you're using, if it's in a data step, then have an if that says if year=2007 and month=1 then start the moving average over, or something to that effect.

    But if different series have different years, you would either have to start running them separately ( a pain) or perhaps you could build a data set that you merge in that provides that info (one column would indicate the item, another would have the year(s) that are relevant with an additional record for each year).

    Then do proc expand/Moving Avg on each, then append those data sets. But again, that would be no fun if you have different series with different years/ranges of years.

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 14 replies
    • 10251 views
    • 5 likes
    • 7 in conversation