I have data sorted by ascending GROUP INFORCE and LAG, as shown below, and I'm interested in calculating a kind of "5 period rolling average" for Cum_Lag_Incrs. The way I need to compute my rolling average is by removing the Max and Min relevant observation and dividing by 3 (e.g. the "average" of the set {1,2,3,4,5} would be 3).
In addition to the average being based on 3 central observations of 5, it needs to come from the correct subset of data.
To be specific, for example: For GROUP=AA and LAG=1, Roll_Avg will be based only on the previous 5 GROUP=AA and LAG=1 values of Cum_Lag_Incrs; for GROUP=AB and LAG=2, Roll_Avg will be based only on the previous 5 GROUP=AB and LAG=2 values of Cum_Lag_Incrs; etc.
Note also that in data below I've shown Roll_Avg as either blank or "Calculate": I've done this because if there aren't 5 previous month of data available, then no calculation should be made.
Group | INFORCE | Lag | Cum_Lag_Incrs | Roll_Avg |
AA | 201301 | 0 | ||
AA | 201301 | 1 | 3.038434616 | |
AA | 201301 | 2 | 1.13597727 | |
AA | 201301 | 3 | 1.088584718 | |
AA | 201302 | 0 | ||
AA | 201302 | 1 | 3.322640299 | |
AA | 201302 | 2 | 1.182485247 | |
AA | 201302 | 3 | 1.036479476 | |
AA | 201303 | 0 | ||
AA | 201303 | 1 | 3.413788991 | |
AA | 201303 | 2 | 1.120095681 | |
AA | 201303 | 3 | 1.047491904 | |
AA | 201304 | 0 | ||
AA | 201304 | 1 | 2.696971256 | |
AA | 201304 | 2 | 1.110429149 | |
AA | 201304 | 3 | 1.046630304 | |
AA | 201305 | 0 | ||
AA | 201305 | 1 | 2.700299318 | |
AA | 201305 | 2 | 1.111799418 | |
AA | 201305 | 3 | 1.045353415 | |
AA | 201306 | 0 | Calculate | |
AA | 201306 | 1 | 3.156808222 | Calculate |
AA | 201306 | 2 | 1.213467751 | Calculate |
AA | 201306 | 3 | 1.152644537 | Calculate |
AA | 201307 | 0 | Calculate | |
AA | 201307 | 1 | 3.149903686 | Calculate |
AA | 201307 | 2 | 1.236895962 | Calculate |
AA | 201307 | 3 | 1.038549449 | Calculate |
AA | 201308 | 0 | Calculate | |
AA | 201308 | 1 | 3.388799982 | Calculate |
AA | 201308 | 2 | 1.17331458 | Calculate |
AA | 201308 | 3 | 1.037821673 | Calculate |
AA | 201309 | 0 | Calculate | |
AA | 201309 | 1 | 2.223705515 | Calculate |
AA | 201309 | 2 | 1.107696717 | Calculate |
AA | 201309 | 3 | 1.067502797 | Calculate |
AA | 201310 | 0 | Calculate | |
AA | 201310 | 1 | 2.598374951 | Calculate |
AA | 201310 | 2 | 1.183557366 | Calculate |
AA | 201310 | 3 | 1.068765806 | Calculate |
AA | 201311 | 0 | Calculate | |
AA | 201311 | 1 | 2.684124925 | Calculate |
AA | 201311 | 2 | 1.882073772 | Calculate |
AA | 201311 | 3 | 1.639452593 | Calculate |
AB | 201301 | 0 | ||
AB | 201301 | 1 | 3.034393497 | |
AB | 201301 | 2 | 1.13446642 | |
AB | 201301 | 3 | 1.087136901 | |
AB | 201302 | 0 | ||
AB | 201302 | 1 | 3.326727146 | |
AB | 201302 | 2 | 1.183939704 | |
AB | 201302 | 3 | 1.035100958 | |
AB | 201303 | 0 | ||
AB | 201303 | 1 | 3.409248651 | |
AB | 201303 | 2 | 1.121473399 | |
AB | 201303 | 3 | 1.04609874 | |
AB | 201304 | 0 | ||
AB | 201304 | 1 | 2.693384284 | |
AB | 201304 | 2 | 1.108952278 | |
AB | 201304 | 3 | 1.047917659 | |
AB | 201305 | 0 | ||
AB | 201305 | 1 | 2.703620687 | |
AB | 201305 | 2 | 1.113166932 | |
AB | 201305 | 3 | 1.046639199 | |
AB | 201306 | 0 | Calculate | |
AB | 201306 | 1 | 3.160691097 | Calculate |
AB | 201306 | 2 | 1.214960317 | Calculate |
AB | 201306 | 3 | 1.15111152 | Calculate |
AB | 201307 | 0 | Calculate | |
AB | 201307 | 1 | 3.145714314 | Calculate |
AB | 201307 | 2 | 1.238417344 | Calculate |
AB | 201307 | 3 | 1.037168178 | Calculate |
AB | 201308 | 0 | Calculate | |
AB | 201308 | 1 | 3.384292878 | Calculate |
AB | 201308 | 2 | 1.171754072 | Calculate |
AB | 201308 | 3 | 1.039098194 | Calculate |
AB | 201309 | 0 | Calculate | |
AB | 201309 | 1 | 2.226440673 | Calculate |
AB | 201309 | 2 | 1.106223481 | Calculate |
AB | 201309 | 3 | 1.068815825 | Calculate |
AB | 201310 | 0 | Calculate | |
AB | 201310 | 1 | 2.601570952 | Calculate |
AB | 201310 | 2 | 1.181983235 | Calculate |
AB | 201310 | 3 | 1.070080388 | Calculate |
AB | 201311 | 0 | Calculate | |
AB | 201311 | 1 | 2.680555039 | Calculate |
AB | 201311 | 2 | 1.879570614 | Calculate |
AB | 201311 | 3 | 1.64146912 | Calculate |
For clarification, note the following: Cum_Lag_Incrs is the ratio between a cumulative total (not shown here) in Lag N+1 and Lag N where N=0,1,2,3. So Cum_Lag_Incrs is always 0 for LAG=0 since there is no cumulative total to speak of before LAG 0. It's like if you sold a guy a car on a payment plan and you tracked the ratio between cumulative payments in this month and cumulative payments in last month; in the base month (the month you sold the car) there would be no cumulative total for this guy through the previous month.
Also, as an example, here's how I'd calculate the Roll_Avg for GROUP=AA, INFORCE=201306, Lag=0,1,2,3. Note that the previous 5 relevant data points are the previous 5 relevant data points from which the Roll_Avg will be calculated (e.g. for GROUP=AA, INFORCE=201306, Lag=1, the relevant values are Cum_Lag_Incrs from 201301-201305).
Group | INFORCE | Lag | Cum_Lag_Incrs | Roll_Avg | Previous 5 Relevant Data Points | ||||
AA | 201306 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
AA | 201306 | 1 | 3.156808222 | 3.020458078 | 3.038434616 | 3.322640299 | 3.413788991 | 2.696971256 | 2.700299318 |
AA | 201306 | 2 | 1.213467751 | =(1.13597727+1.182485247+1.120095681+1.110429149+1.111799418-1.110429149-1.182485247)/3 | 1.13597727 | 1.182485247 | 1.120095681 | 1.110429149 | 1.111799418 |
AA | 201306 | 3 | 1.152644537 | 1.046491874 | 1.088584718 | 1.036479476 | 1.047491904 | 1.046630304 | 1.045353415 |
Hi, I am little slow to grasp. May I ask a couple of questions plz-
1. Can you explain the blanks in the Cum_Lag_Incrs?
2. Can you manually show an example of filling the 4 or 5 records of the calculate from the sample data or in other words, what will be the value of the calculate(roll avg) for the following records-
AA | 201306 | 0 | Calculate | |
AA | 201306 | 1 | 3.156808222 | Calculate |
AA | 201306 | 2 | 1.213467751 | Calculate |
AA | 201306 | 3 | 1.152644537 | Calculate |
Hi, I have got the solution below that is close, however I still need help on dividing by 3 . Please test and let me know where I haven't understood the convert logic. I will rework and fix it properly. It should be a minor fix
data have;
infile datalines expandtabs truncover;
input Group $ INFORCE Lag Cum_Lag_Incrs ;
datalines;
AA 201301 0
AA 201301 1 3.038434616
AA 201301 2 1.13597727
AA 201301 3 1.088584718
AA 201302 0
AA 201302 1 3.322640299
AA 201302 2 1.182485247
AA 201302 3 1.036479476
AA 201303 0
AA 201303 1 3.413788991
AA 201303 2 1.120095681
AA 201303 3 1.047491904
AA 201304 0
AA 201304 1 2.696971256
AA 201304 2 1.110429149
AA 201304 3 1.046630304
AA 201305 0
AA 201305 1 2.700299318
AA 201305 2 1.111799418
AA 201305 3 1.045353415
AA 201306 0 Calculate
AA 201306 1 3.156808222 Calculate
AA 201306 2 1.213467751 Calculate
AA 201306 3 1.152644537 Calculate
AA 201307 0 Calculate
AA 201307 1 3.149903686 Calculate
AA 201307 2 1.236895962 Calculate
AA 201307 3 1.038549449 Calculate
AA 201308 0 Calculate
AA 201308 1 3.388799982 Calculate
AA 201308 2 1.17331458 Calculate
AA 201308 3 1.037821673 Calculate
AA 201309 0 Calculate
AA 201309 1 2.223705515 Calculate
AA 201309 2 1.107696717 Calculate
AA 201309 3 1.067502797 Calculate
AA 201310 0 Calculate
AA 201310 1 2.598374951 Calculate
AA 201310 2 1.183557366 Calculate
AA 201310 3 1.068765806 Calculate
AA 201311 0 Calculate
AA 201311 1 2.684124925 Calculate
AA 201311 2 1.882073772 Calculate
AA 201311 3 1.639452593 Calculate
AB 201301 0
AB 201301 1 3.034393497
AB 201301 2 1.13446642
AB 201301 3 1.087136901
AB 201302 0
AB 201302 1 3.326727146
AB 201302 2 1.183939704
AB 201302 3 1.035100958
AB 201303 0
AB 201303 1 3.409248651
AB 201303 2 1.121473399
AB 201303 3 1.04609874
AB 201304 0
AB 201304 1 2.693384284
AB 201304 2 1.108952278
AB 201304 3 1.047917659
AB 201305 0
AB 201305 1 2.703620687
AB 201305 2 1.113166932
AB 201305 3 1.046639199
AB 201306 0 Calculate
AB 201306 1 3.160691097 Calculate
AB 201306 2 1.214960317 Calculate
AB 201306 3 1.15111152 Calculate
AB 201307 0 Calculate
AB 201307 1 3.145714314 Calculate
AB 201307 2 1.238417344 Calculate
AB 201307 3 1.037168178 Calculate
AB 201308 0 Calculate
AB 201308 1 3.384292878 Calculate
AB 201308 2 1.171754072 Calculate
AB 201308 3 1.039098194 Calculate
AB 201309 0 Calculate
AB 201309 1 2.226440673 Calculate
AB 201309 2 1.106223481 Calculate
AB 201309 3 1.068815825 Calculate
AB 201310 0 Calculate
AB 201310 1 2.601570952 Calculate
AB 201310 2 1.181983235 Calculate
AB 201310 3 1.070080388 Calculate
AB 201311 0 Calculate
AB 201311 1 2.680555039 Calculate
AB 201311 2 1.879570614 Calculate
AB 201311 3 1.64146912 Calculate
;
proc sort data= have out=have1;
by group lag;
run;
data temp;
do until(last.group);
do _n_=1 by 1 until(last.lag);
set have1;
by group lag;
if lag=0 then do;
output;
leave;
end;
l1=lag(cum_lag_incrs);
l2=lag2(cum_lag_incrs);
l3=lag3(cum_lag_incrs);
l4=lag4(cum_lag_incrs);
l5=lag5(cum_lag_incrs);
if _n_=1 then call missing(rollavg,l1,l2,l3,l4,l5);
if _n_>=6 then rollavg=mean(l1,l2,l3,l4,l5);
output;
end;
end;
drop l1-l5;
run;
proc sort data= temp out=want;
by group inforce lag;
run;
Regards,
Naveen Srinivasan
PS I assumed you are taking one year data of 2013 and your sample data resembles very much of the actual. Assumptions might make me lose track but I love these kinda problems,so I am in it for your feedback and improvment. Thanks!
Do you have SAS/ETS?
If so, I recommend using PROC TIMESERIES To fill in the dataset so you have an entry for every 'period' required.
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
Because of the Max/Min options, you're best off cacluating the mean using the array method that's illustrated here
https://gist.github.com/statgeek/27e23c015eae7953eff2
Using the min/max array link you provided, I'm able to achieve what I want, but in a convoluted way.
The biggeest issue I have is that I don't want to compute the trimmed average until there are 5 previous datapoints to base my calculation off of. So I'm numbering the observation from oldest to newest within each grouping of LAGs from 1 and up, and then after the calculations have all occurred I'm deleting the rows on which calculations ocurred, but shouldn't have ocurred.
Any suggestions for how to make this code a little better?
data want_00;
set have;
BY Group LAG INFORCE;
if first.LAG then count=0;
count+1;
run;
data want_01;
array p{0:4} _temporary_;
set want_00;
if first.LAG then call missing(of p{*});
lowest = min(of p{*});
highest = max(of p{*});
sum = sum(of p{*});
p{mod(_n_,5)} = Cum_Lag_Incrs;
if count< 5 then highest ="";
if count<5 then lowest ="";
if count<5 then sum="";
average = (sum-highest-lowest)/3;
run;
if count < 5 then call missing(highest, lowest, sum);
You can use CALL MISSING to set multiple values to missing in one step. It doesn't matter if they're numeric or character.
Just a note that you were setting the missing to "" which is missing for a character variable not a numeric variable.
Missing for numerics are denoted with a period (.) and it should have been:
if count < 5 then highest = .;
I went with your current variable names, but SAS has FUNCTIONS with the names SUM and COUNT so I would consider using other variable names instead, if possible.
@novinosrin I am away from SAS until this afternoon, so I can't test until then. But thank you for your reply!
We divide by 3 because I want the average of the 5 relevant data points AFTER I throw out the max and min observation.
For example(s) (here I use integers for simplicity):
If relevant 5 Cum_Lag_Incrs was {1,3,5,7,11} then I throw out 11 and 1 and I take average of the remaining set {3,5,7} and I get (3+5+7)/3=5
{5,33,9,8,22} would have average (9+8+22)/3=13
{2,4,5,68,,8} would have average (4+5+8)/3=5.666
@acemanhattan this is known as a trimmed mean. You can also use PROC EXPAND to see if can do a Trimmed Mean.
Hi,
Transposing the dataset for making the calculation easy solves the complexicity. Understand my code for your solution. Validated the results, they are exactly matching as you required.
DATA WORK.Sample_Test;
LENGTH Group $2 INFORCE 8 Lag 8 Cum_Lag_Incrs $ 11 ;
FORMAT Group $CHAR2. INFORCE BEST12. Lag BEST12. Cum_Lag_Incrs $CHAR11.;
INFORMAT Group $CHAR2. INFORCE BEST12. Lag BEST12. Cum_Lag_Incrs $CHAR11. ;
INFILE DATALINES4 DLM="," MISSOVER DSD;
INPUT Group: $CHAR2. INFORCE: BEST32. Lag: BEST32. Cum_Lag_Incrs: $CHAR11. ;
DATALINES4;
AA,201301,0,
AA,201301,1,3.038434616
AA,201301,2,1.13597727
AA,201301,3,1.088584718
AA,201302,0,
AA,201302,1,3.322640299
AA,201302,2,1.182485247
AA,201302,3,1.036479476
AA,201303,0,
AA,201303,1,3.413788991
AA,201303,2,1.120095681
AA,201303,3,1.047491904
AA,201304,0,
AA,201304,1,2.696971256
AA,201304,2,1.110429149
AA,201304,3,1.046630304
AA,201305,0,
AA,201305,1,2.700299318
AA,201305,2,1.111799418
AA,201305,3,1.045353415
AA,201306,0,
AA,201306,1,3.156808222
AA,201306,2,1.213467751
AA,201306,3,1.152644537
AA,201307,0,
AA,201307,1,3.149903686
AA,201307,2,1.236895962
AA,201307,3,1.038549449
AA,201308,0,
AA,201308,1,3.388799982
AA,201308,2,1.17331458
AA,201308,3,1.037821673
AA,201309,0,
AA,201309,1,2.223705515
AA,201309,2,1.107696717
AA,201309,3,1.067502797
AA,201310,0,
AA,201310,1,2.598374951
AA,201310,2,1.183557366
AA,201310,3,1.068765806
AA,201311,0,
AA,201311,1,2.684124925
AA,201311,2,1.882073772
AA,201311,3,1.639452593
AB,201301,0,
AB,201301,1,3.034393497
AB,201301,2,1.13446642
AB,201301,3,1.087136901
AB,201302,0,
AB,201302,1,3.326727146
AB,201302,2,1.183939704
AB,201302,3,1.035100958
AB,201303,0,
AB,201303,1,3.409248651
AB,201303,2,1.121473399
AB,201303,3,1.04609874
AB,201304,0,
AB,201304,1,2.693384284
AB,201304,2,1.108952278
AB,201304,3,1.047917659
AB,201305,0,
AB,201305,1,2.703620687
AB,201305,2,1.113166932
AB,201305,3,1.046639199
AB,201306,0,
AB,201306,1,3.160691097
AB,201306,2,1.214960317
AB,201306,3,1.15111152
AB,201307,0,
AB,201307,1,3.145714314
AB,201307,2,1.238417344
AB,201307,3,1.037168178
AB,201308,0,
AB,201308,1,3.384292878
AB,201308,2,1.171754072
AB,201308,3,1.039098194
AB,201309,0,
AB,201309,1,2.226440673
AB,201309,2,1.106223481
AB,201309,3,1.068815825
AB,201310,0,
AB,201310,1,2.601570952
AB,201310,2,1.181983235
AB,201310,3,1.070080388
AB,201311,0,
AB,201311,1,2.680555039
AB,201311,2,1.879570614
AB,201311,3,1.64146912
;;;;
Proc sort data=sample_test;
by Group lag inforce Cum_Lag_Incrs;
run;
data test;
set sample_test;
Cum_Lag=INPUT(Cum_Lag_Incrs,12.6);
IF MISSING(Cum_Lag) then Cum_Lag=0;
run;
PROC TRANSPOSE DATA=test prefix=INFORCE_ out=test2(DROP=_NAME_);
BY Group lag;
ID inforce;
VAR Cum_Lag;
run;
PROC SQL;
SELECT NAME INTO: variables separated by " "
FROM DICTIONARY.COLUMNS
WHERE LIBNAME="WORK" AND MEMNAME="TEST2" AND NAME LIKE "INFORCE%";
QUIT;
OPTIONS SYMBOLGEN Mprint MLOGIC;
%MACRO TEST();
DATA max_Val (DROP=&variables);
SET test2;
%DO I=1 %to 6;
%LET Start_Value=%SCAN("&variables",&i," ");
%LET end_Value =%SCAN("&variables",%EVAL(&i+4)," ");
%LET New_Value =%SCAN("&variables",%EVAL(&i+5)," ");
Required_&New_Value=(SUM(OF &Start_Value.-&end_Value.)-MAX(OF &Start_Value.-&end_Value.)-Min(OF &Start_Value.-&end_Value.))/3;
%END;
RUN;
%MEND;
%TEST();
PROC TRANSPOSE DATA=max_Val out=required(rename=(_Name_=INFORCE_ COL1=Roll_Avg));
BY Group lag;
RUN;
DATA pre(Drop=Inforce_);
set required;
INFORCE=INPUT(SCAN(INFORCE_,3,"_"),8.);
run;
proc sort data=sample_test;
by Group Lag Inforce;
Proc sort data=pre;
by Group lag inforce ;
run;
data need;
Merge sample_test pre;
by Group Lag Inforce;
RUN;
PROC SORT DATA=need;
BY Group INFORCE;
run;
Thanks,
Suryakiran
@acemanhattan have you looked my solution. You can transpose the original data and then use a loop to calculate the rolling average from the point where you need using simple do loop.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.