Contributor
Posts: 46

# Rolling 5 Month Average of Central 3 Values By Group

[ Edited ]

I have data sorted by ascending GROUP INFORCE and LAG, as shown below, and I'm interested in calculating a kind of "5 period rolling average" for Cum_Lag_Incrs. The way I need to compute my rolling average is by removing the Max and Min relevant observation and dividing by 3 (e.g. the "average" of the set {1,2,3,4,5} would be 3).

In addition to the average being based on 3 central observations of 5, it needs to come from the correct subset of data.

To be specific, for example: For GROUP=AA and LAG=1, Roll_Avg will be based only on the previous 5 GROUP=AA and LAG=1 values of Cum_Lag_Incrs; for GROUP=AB and LAG=2, Roll_Avg will be based only on the previous 5 GROUP=AB and LAG=2 values of Cum_Lag_Incrs; etc.

Note also that in data below I've shown Roll_Avg as either blank or "Calculate": I've done this because if there aren't 5 previous month of data available, then no calculation should be made.

 Group INFORCE Lag Cum_Lag_Incrs Roll_Avg AA 201301 0 AA 201301 1 3.038434616 AA 201301 2 1.13597727 AA 201301 3 1.088584718 AA 201302 0 AA 201302 1 3.322640299 AA 201302 2 1.182485247 AA 201302 3 1.036479476 AA 201303 0 AA 201303 1 3.413788991 AA 201303 2 1.120095681 AA 201303 3 1.047491904 AA 201304 0 AA 201304 1 2.696971256 AA 201304 2 1.110429149 AA 201304 3 1.046630304 AA 201305 0 AA 201305 1 2.700299318 AA 201305 2 1.111799418 AA 201305 3 1.045353415 AA 201306 0 Calculate AA 201306 1 3.156808222 Calculate AA 201306 2 1.213467751 Calculate AA 201306 3 1.152644537 Calculate AA 201307 0 Calculate AA 201307 1 3.149903686 Calculate AA 201307 2 1.236895962 Calculate AA 201307 3 1.038549449 Calculate AA 201308 0 Calculate AA 201308 1 3.388799982 Calculate AA 201308 2 1.17331458 Calculate AA 201308 3 1.037821673 Calculate AA 201309 0 Calculate AA 201309 1 2.223705515 Calculate AA 201309 2 1.107696717 Calculate AA 201309 3 1.067502797 Calculate AA 201310 0 Calculate AA 201310 1 2.598374951 Calculate AA 201310 2 1.183557366 Calculate AA 201310 3 1.068765806 Calculate AA 201311 0 Calculate AA 201311 1 2.684124925 Calculate AA 201311 2 1.882073772 Calculate AA 201311 3 1.639452593 Calculate AB 201301 0 AB 201301 1 3.034393497 AB 201301 2 1.13446642 AB 201301 3 1.087136901 AB 201302 0 AB 201302 1 3.326727146 AB 201302 2 1.183939704 AB 201302 3 1.035100958 AB 201303 0 AB 201303 1 3.409248651 AB 201303 2 1.121473399 AB 201303 3 1.04609874 AB 201304 0 AB 201304 1 2.693384284 AB 201304 2 1.108952278 AB 201304 3 1.047917659 AB 201305 0 AB 201305 1 2.703620687 AB 201305 2 1.113166932 AB 201305 3 1.046639199 AB 201306 0 Calculate AB 201306 1 3.160691097 Calculate AB 201306 2 1.214960317 Calculate AB 201306 3 1.15111152 Calculate AB 201307 0 Calculate AB 201307 1 3.145714314 Calculate AB 201307 2 1.238417344 Calculate AB 201307 3 1.037168178 Calculate AB 201308 0 Calculate AB 201308 1 3.384292878 Calculate AB 201308 2 1.171754072 Calculate AB 201308 3 1.039098194 Calculate AB 201309 0 Calculate AB 201309 1 2.226440673 Calculate AB 201309 2 1.106223481 Calculate AB 201309 3 1.068815825 Calculate AB 201310 0 Calculate AB 201310 1 2.601570952 Calculate AB 201310 2 1.181983235 Calculate AB 201310 3 1.070080388 Calculate AB 201311 0 Calculate AB 201311 1 2.680555039 Calculate AB 201311 2 1.879570614 Calculate AB 201311 3 1.64146912 Calculate

For clarification, note the following: Cum_Lag_Incrs is the ratio between a cumulative total (not shown here) in Lag N+1 and Lag N where N=0,1,2,3. So Cum_Lag_Incrs is always 0 for LAG=0 since there is no cumulative total to speak of before LAG 0. It's like if you sold a guy a car on a payment plan and you tracked the ratio between cumulative payments in this month and cumulative payments in last month; in the base month (the month you sold the car) there would be no cumulative total for this guy through the previous month.

Also, as an example, here's how I'd calculate the Roll_Avg for GROUP=AA, INFORCE=201306, Lag=0,1,2,3. Note that the previous 5 relevant data points are the previous 5 relevant data points from which the Roll_Avg will be calculated (e.g. for GROUP=AA, INFORCE=201306, Lag=1, the relevant values are Cum_Lag_Incrs from 201301-201305).

 Group INFORCE Lag Cum_Lag_Incrs Roll_Avg Previous 5 Relevant Data Points AA 201306 0 0 0 0 0 0 0 AA 201306 1 3.156808222 3.020458078 3.038434616 3.322640299 3.413788991 2.696971256 2.700299318 AA 201306 2 1.213467751 =(1.13597727+1.182485247+1.120095681+1.110429149+1.111799418-1.110429149-1.182485247)/3 1.13597727 1.182485247 1.120095681 1.110429149 1.111799418 AA 201306 3 1.152644537 1.046491874 1.088584718 1.036479476 1.047491904 1.046630304 1.045353415
Super User
Posts: 2,059

## Re: Rolling 5 Month Average of Central 3 Values By Group

Hi, I am little slow to grasp. May I ask a couple of questions plz-

1.  Can you explain the blanks in the Cum_Lag_Incrs?

2. Can you manually show an example of filling the 4 or 5 records of the calculate from the sample data or in other words, what will be the value of the calculate(roll avg) for the following records-

 AA 201306 0 Calculate AA 201306 1 3.156808222 Calculate AA 201306 2 1.213467751 Calculate AA 201306 3 1.152644537 Calculate
Contributor
Posts: 46

## Re: Rolling 5 Month Average of Central 3 Values By Group

Thanks for taking an interest in this question.  See my updated post above!

Super User
Posts: 2,059

## Re: Rolling 5 Month Average of Central 3 Values By Group

[ Edited ]

Hi, I have got the solution below that is close, however I still need help on dividing by 3 . Please test and let me know where I haven't understood the convert logic. I will rework and fix it properly. It should be a minor fix

data have;

infile datalines expandtabs truncover;

input Group \$   INFORCE    Lag  Cum_Lag_Incrs   ;

datalines;

AA   201301     0

AA   201301     1    3.038434616

AA   201301     2    1.13597727

AA   201301     3    1.088584718

AA   201302     0

AA   201302     1    3.322640299

AA   201302     2    1.182485247

AA   201302     3    1.036479476

AA   201303     0

AA   201303     1    3.413788991

AA   201303     2    1.120095681

AA   201303     3    1.047491904

AA   201304     0

AA   201304     1    2.696971256

AA   201304     2    1.110429149

AA   201304     3    1.046630304

AA   201305     0

AA   201305     1    2.700299318

AA   201305     2    1.111799418

AA   201305     3    1.045353415

AA   201306     0                   Calculate

AA   201306     1    3.156808222     Calculate

AA   201306     2    1.213467751     Calculate

AA   201306     3    1.152644537     Calculate

AA   201307     0                   Calculate

AA   201307     1    3.149903686     Calculate

AA   201307     2    1.236895962     Calculate

AA   201307     3    1.038549449     Calculate

AA   201308     0                   Calculate

AA   201308     1    3.388799982     Calculate

AA   201308     2    1.17331458 Calculate

AA   201308     3    1.037821673     Calculate

AA   201309     0                   Calculate

AA   201309     1    2.223705515     Calculate

AA   201309     2    1.107696717     Calculate

AA   201309     3    1.067502797     Calculate

AA   201310     0                   Calculate

AA   201310     1    2.598374951     Calculate

AA   201310     2    1.183557366     Calculate

AA   201310     3    1.068765806     Calculate

AA   201311     0                   Calculate

AA   201311     1    2.684124925     Calculate

AA   201311     2    1.882073772     Calculate

AA   201311     3    1.639452593     Calculate

AB   201301     0

AB   201301     1    3.034393497

AB   201301     2    1.13446642

AB   201301     3    1.087136901

AB   201302     0

AB   201302     1    3.326727146

AB   201302     2    1.183939704

AB   201302     3    1.035100958

AB   201303     0

AB   201303     1    3.409248651

AB   201303     2    1.121473399

AB   201303     3    1.04609874

AB   201304     0

AB   201304     1    2.693384284

AB   201304     2    1.108952278

AB   201304     3    1.047917659

AB   201305     0

AB   201305     1    2.703620687

AB   201305     2    1.113166932

AB   201305     3    1.046639199

AB   201306     0                   Calculate

AB   201306     1    3.160691097     Calculate

AB   201306     2    1.214960317     Calculate

AB   201306     3    1.15111152 Calculate

AB   201307     0                   Calculate

AB   201307     1    3.145714314     Calculate

AB   201307     2    1.238417344     Calculate

AB   201307     3    1.037168178     Calculate

AB   201308     0                   Calculate

AB   201308     1    3.384292878     Calculate

AB   201308     2    1.171754072     Calculate

AB   201308     3    1.039098194     Calculate

AB   201309     0                   Calculate

AB   201309     1    2.226440673     Calculate

AB   201309     2    1.106223481     Calculate

AB   201309     3    1.068815825     Calculate

AB   201310     0                   Calculate

AB   201310     1    2.601570952     Calculate

AB   201310     2    1.181983235     Calculate

AB   201310     3    1.070080388     Calculate

AB   201311     0                   Calculate

AB   201311     1    2.680555039     Calculate

AB   201311     2    1.879570614     Calculate

AB   201311     3    1.64146912 Calculate

;

proc sort data= have out=have1;

by group lag;

run;

data temp;

do until(last.group);

do _n_=1 by 1 until(last.lag);

set have1;

by  group lag;

if lag=0 then do;

output;

leave;

end;

l1=lag(cum_lag_incrs);

l2=lag2(cum_lag_incrs);

l3=lag3(cum_lag_incrs);

l4=lag4(cum_lag_incrs);

l5=lag5(cum_lag_incrs);

if _n_=1 then call missing(rollavg,l1,l2,l3,l4,l5);

if _n_>=6 then rollavg=mean(l1,l2,l3,l4,l5);

output;

end;

end;

drop l1-l5;

run;

proc sort data= temp out=want;

by group inforce lag;

run;

Regards,

Naveen Srinivasan

PS I assumed you are taking one year data of 2013 and your sample data resembles very much of the actual. Assumptions might make me lose track but I love these kinda problems,so I am in it for your feedback and improvment. Thanks!

Super User
Posts: 24,000

## Re: Rolling 5 Month Average of Central 3 Values By Group

Do you have SAS/ETS?

If so, I recommend using PROC TIMESERIES To fill in the dataset so you have an entry for every 'period' required.

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

Because of the Max/Min options, you're best off cacluating the mean using the array method that's illustrated here

https://gist.github.com/statgeek/27e23c015eae7953eff2

Contributor
Posts: 46

## Re: Rolling 5 Month Average of Central 3 Values By Group

I do have SAS/ETS. Thanks for the tips.
Contributor
Posts: 46

## Re: Rolling 5 Month Average of Central 3 Values By Group

Using the min/max array link you provided, I'm able to achieve what I want, but in a convoluted way.

The biggeest issue I have is that I don't want to compute the trimmed average until there are 5 previous datapoints to base my calculation off of.  So I'm numbering the observation from oldest to newest within each grouping of LAGs from 1 and up, and then after the calculations have all occurred I'm deleting the rows on which calculations ocurred, but shouldn't have ocurred.

Any suggestions for how to make this code a little better?

``````data want_00;
set have;
BY Group LAG INFORCE;
if first.LAG then count=0;
count+1;
run;

data want_01;
array p{0:4} _temporary_;
set want_00;
if first.LAG then call missing(of p{*});
lowest = min(of p{*});
highest = max(of p{*});
sum = sum(of p{*});
p{mod(_n_,5)} = Cum_Lag_Incrs;
if count< 5 then highest ="";
if count<5 then lowest ="";
if count<5 then sum="";
average = (sum-highest-lowest)/3;
run;
``````
Super User
Posts: 24,000

## Re: Rolling 5 Month Average of Central 3 Values By Group

``if count < 5 then call missing(highest, lowest, sum);``

You can use CALL MISSING to set multiple values to missing in one step. It doesn't matter if they're numeric or character.

Just a note that you were setting the missing to "" which is missing for a character variable not a numeric variable.

Missing for numerics are denoted with a period (.) and it should have been:

``if count < 5 then highest = .;``

I went with your current variable names, but SAS has FUNCTIONS with the names SUM and COUNT so I would consider using other variable names instead, if possible.

Contributor
Posts: 46

## Re: Rolling 5 Month Average of Central 3 Values By Group

[ Edited ]

@novinosrin I am away from SAS until this afternoon, so I can't test until then.  But thank you for your reply!

We divide by 3 because I want the average of the 5 relevant data points AFTER I throw out the max and min observation.

For example(s) (here I use integers for simplicity):

If relevant 5 Cum_Lag_Incrs was {1,3,5,7,11} then I throw out 11 and 1 and I take average of the remaining set {3,5,7} and I get (3+5+7)/3=5

{5,33,9,8,22} would have average (9+8+22)/3=13

{2,4,5,68,,8} would have average (4+5+8)/3=5.666

Super User
Posts: 24,000

## Re: Rolling 5 Month Average of Central 3 Values By Group

@acemanhattan this is known as a trimmed mean. You can also use PROC EXPAND to see if can do a Trimmed Mean.

PROC Star
Posts: 629

## Re: Rolling 5 Month Average of Central 3 Values By Group

[ Edited ]

Hi,

Transposing the dataset for making the calculation easy solves the complexicity. Understand my code for your solution. Validated the results, they are exactly matching as you required.

``````DATA WORK.Sample_Test;
LENGTH Group \$2 INFORCE 8 Lag 8 Cum_Lag_Incrs \$ 11 ;
FORMAT Group \$CHAR2. INFORCE BEST12. Lag BEST12. Cum_Lag_Incrs \$CHAR11.;
INFORMAT Group \$CHAR2. INFORCE BEST12. Lag BEST12. Cum_Lag_Incrs \$CHAR11. ;
INFILE DATALINES4 DLM="," MISSOVER DSD;
INPUT Group: \$CHAR2. INFORCE: BEST32. Lag: BEST32. Cum_Lag_Incrs: \$CHAR11. ;
DATALINES4;
AA,201301,0,
AA,201301,1,3.038434616
AA,201301,2,1.13597727
AA,201301,3,1.088584718
AA,201302,0,
AA,201302,1,3.322640299
AA,201302,2,1.182485247
AA,201302,3,1.036479476
AA,201303,0,
AA,201303,1,3.413788991
AA,201303,2,1.120095681
AA,201303,3,1.047491904
AA,201304,0,
AA,201304,1,2.696971256
AA,201304,2,1.110429149
AA,201304,3,1.046630304
AA,201305,0,
AA,201305,1,2.700299318
AA,201305,2,1.111799418
AA,201305,3,1.045353415
AA,201306,0,
AA,201306,1,3.156808222
AA,201306,2,1.213467751
AA,201306,3,1.152644537
AA,201307,0,
AA,201307,1,3.149903686
AA,201307,2,1.236895962
AA,201307,3,1.038549449
AA,201308,0,
AA,201308,1,3.388799982
AA,201308,2,1.17331458
AA,201308,3,1.037821673
AA,201309,0,
AA,201309,1,2.223705515
AA,201309,2,1.107696717
AA,201309,3,1.067502797
AA,201310,0,
AA,201310,1,2.598374951
AA,201310,2,1.183557366
AA,201310,3,1.068765806
AA,201311,0,
AA,201311,1,2.684124925
AA,201311,2,1.882073772
AA,201311,3,1.639452593
AB,201301,0,
AB,201301,1,3.034393497
AB,201301,2,1.13446642
AB,201301,3,1.087136901
AB,201302,0,
AB,201302,1,3.326727146
AB,201302,2,1.183939704
AB,201302,3,1.035100958
AB,201303,0,
AB,201303,1,3.409248651
AB,201303,2,1.121473399
AB,201303,3,1.04609874
AB,201304,0,
AB,201304,1,2.693384284
AB,201304,2,1.108952278
AB,201304,3,1.047917659
AB,201305,0,
AB,201305,1,2.703620687
AB,201305,2,1.113166932
AB,201305,3,1.046639199
AB,201306,0,
AB,201306,1,3.160691097
AB,201306,2,1.214960317
AB,201306,3,1.15111152
AB,201307,0,
AB,201307,1,3.145714314
AB,201307,2,1.238417344
AB,201307,3,1.037168178
AB,201308,0,
AB,201308,1,3.384292878
AB,201308,2,1.171754072
AB,201308,3,1.039098194
AB,201309,0,
AB,201309,1,2.226440673
AB,201309,2,1.106223481
AB,201309,3,1.068815825
AB,201310,0,
AB,201310,1,2.601570952
AB,201310,2,1.181983235
AB,201310,3,1.070080388
AB,201311,0,
AB,201311,1,2.680555039
AB,201311,2,1.879570614
AB,201311,3,1.64146912
;;;;

Proc sort data=sample_test;
by Group lag inforce Cum_Lag_Incrs;
run;
data test;
set sample_test;
Cum_Lag=INPUT(Cum_Lag_Incrs,12.6);
IF MISSING(Cum_Lag) then Cum_Lag=0;
run;

PROC TRANSPOSE DATA=test prefix=INFORCE_ out=test2(DROP=_NAME_);
BY Group lag;
ID inforce;
VAR Cum_Lag;
run;

PROC SQL;
SELECT NAME INTO: variables separated by " "
FROM DICTIONARY.COLUMNS
WHERE LIBNAME="WORK" AND MEMNAME="TEST2" AND NAME LIKE "INFORCE%";
QUIT;
OPTIONS SYMBOLGEN Mprint MLOGIC;

%MACRO TEST();
DATA max_Val (DROP=&variables);
SET test2;
%DO I=1 %to 6;
%LET Start_Value=%SCAN("&variables",&i," ");
%LET end_Value  =%SCAN("&variables",%EVAL(&i+4)," ");
%LET New_Value  =%SCAN("&variables",%EVAL(&i+5)," ");

Required_&New_Value=(SUM(OF &Start_Value.-&end_Value.)-MAX(OF &Start_Value.-&end_Value.)-Min(OF &Start_Value.-&end_Value.))/3;
%END;
RUN;
%MEND;
%TEST();

PROC TRANSPOSE DATA=max_Val out=required(rename=(_Name_=INFORCE_ COL1=Roll_Avg));
BY Group lag;
RUN;

DATA pre(Drop=Inforce_);
set required;
INFORCE=INPUT(SCAN(INFORCE_,3,"_"),8.);
run;
proc sort data=sample_test;
by Group Lag Inforce;
Proc sort data=pre;
by Group lag inforce ;
run;
data need;
Merge sample_test pre;
by Group Lag Inforce;
RUN;
PROC SORT DATA=need;
BY Group INFORCE;
run;``````

Thanks,

Suryakiran

Thanks,
Suryakiran
PROC Star
Posts: 629