- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Experts,
I would like to calculate a rolling average (e.g., 12-month window using daily obs at the end of every month) of multiple variables. So, the end data would have monthly observations that are 12-month averages of daily observations.
First of all, I have learned a lot from some early posts such as
https://communities.sas.com/t5/SAS-Programming/Standard-deviation-on-rolling-basis/m-p/446482).
Most of my work is based on these posts.
As you could see from my sample code below, I am using hash object with arrays and I am using a modified formula for calculating the average since I want to leave out the maximum daily value in each 12-month period. Also there are some conditions such as discarding the value when the average is calculated from less than 200 daily observations.
Now what I am trying to do is to simply expand this to multiple variables (Posts cited above use only a single variable).
I would also like to know if this could be done using macro variables as I have dozens of variables (note: the sample code below only contains two variables "aaaa" and "bbbb" for simplicity). I just want to find a way to simply repeating the code for each variables.
data want;
if _n_=1 then do;
if 0 then set have(rename=(aaaa=_aaaa bbbb=_bbbb));
declare hash h(multidata:'y'); h.definekey('date'); h.definedata('_aaaa', '_bbbb'); h.definedone();
end;
array x x1-x2 _temporary_;
do until(last.id_var);
set have;
by id_var;
_aaaa = aaaa; _bbbb = bbbb; h.add();
end;
do until(last.id_var);
set have;
by id_var date_m;
n=0; call missing(of x{*});
if last.date_m then do;
do i=intnx('month', date_m, -12, 'b') to intnx('month', date_m, -1, 'e');
rc=h.find(key:i);
do while(rc=0);
/* This part is mostly confusing. It works with single variable "aaaa". But I don't know how to change this for multiple variables. Is there a way without simply repeating this part? */
if not missing(_aaaa) then do; n+1; x1{n}=_aaaa; end;
if not missing(_bbbb) then do; n+1; x2{n}=_bbbb; end;
rc=h.find_next(key:i);
end;
end;
aaaa_12m = ( sum(of x1{*}) - max(of x1{*}) ) / ( n-1 ) ; * delete the highest daily value in each 12-month period;
bbbb_12m = ( sum(of x2{*}) - max(of x2{*}) ) / ( n-1 ) ; * delete the highest daily value in each 12-month period;
if n<200 then do; aaaa_12m =.; bbbb_12m =.; end;
output;
end;
end;
h.clear();
drop i _aaaa _bbbb rc date aaaa bbbb n;
run;
Thank you so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It'd be really useful t have some test data so we can confirm our code matches your expected results.
Also congrats on well formatted code! (except for the tabs of course, they break alignment in other editors; spaces guarantee alignment).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It seems you want this:
data HAVE;
input ID DATE_M date9. VAR1 VAR2;
format DATE_M date9.;
cards;
1 01jan2002 1 11
1 01feb2002 2 22
1 01mar2002 3 33
1 01apr2002 3 33
1 01may2002 . 33
1 01jun2002 3 33
;
data WANT;
if _N_=1 then do;
declare hash H(dataset:'HAVE(rename=(DATE_M=DATE))', multidata:'y'); h.definekey('DATE'); h.definedata('VAR1', 'VAR2'); h.definedone();
end;
array _SUM[2] ;
array _MAX[2] ;
array _N[2] ;
do until(last.ID);
set HAVE(drop=VAR1 VAR2);
by ID DATE_M;
call missing(of _: );
if last.DATE_M then do;
do DATE=intnx('month', DATE_M, -12, 'b') to intnx('month', DATE_M, -1, 'e');
RC=H.find();
if RC then continue;
_SUM[1]+VAR1 ;
_SUM[2]+VAR2 ;
_MAX[1]=max(_MAX[1],VAR1) ;
_MAX[2]=max(_MAX[2],VAR2) ;
_N[1]+(^missing(VAR1));
_N[2]+(^missing(VAR2));
end;
VAR1_12M = divide( _SUM[1]-_MAX[1] , ifn(_N[1]<0.200,0,_N[1]-1) ) ; * delete the highest daily value in each 12-month period;
VAR2_12M = divide( _SUM[2]-_MAX[2] , ifn(_N[2]<0.200,0,_N[2]-1) ) ; * delete the highest daily value in each 12-month period;
output;
end;
end;
H.clear();
drop DATE _: RC VAR1 VAR2 ;
proc print;run;
ID | DATE_M | VAR1_12M | VAR2_12M |
---|---|---|---|
1 | 01JAN2002 | . | . |
1 | 01FEB2002 | . | . |
1 | 01MAR2002 | 1.0 | 11.00 |
1 | 01APR2002 | 1.5 | 16.50 |
1 | 01MAY2002 | 2.0 | 22.00 |
1 | 01JUN2002 | 2.0 | 24.75 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the tab advice. I prefer tabs but I shall use spaces when I post codes.
For the code, I think this is it!
The biggest difference is the way the array is used. And I guess the reason you put 0.200 in ifn function is due to the observation number.
I should be able to expand this to many more variables.
Thanks for the help! Appreciate it.
A follow-up: How should I modify this if I want to get, say, standard deviation, using the STD function?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unlike max, std requires all values to be known at once. so the easy way would be to fill an array with all the values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're right. So I could use my original code to do that for a single variable. The problem is I don't see how I could count "n" for each of the variables if I use more than one.