BookmarkSubscribeRSS Feed
MadQuant
Obsidian | Level 7

Hi Experts, 

I would like to calculate a rolling average (e.g., 12-month window using daily obs at the end of every month) of multiple variables. So, the end data would have monthly observations that are 12-month averages of daily observations. 

 

First of all, I have learned a lot from some early posts such as

https://communities.sas.com/t5/SAS-Programming/calculating-rolling-standard-deviation-of-stock-retur... and

https://communities.sas.com/t5/SAS-Programming/Standard-deviation-on-rolling-basis/m-p/446482).

Most of my work is based on these posts. 

 

As you could see from my sample code below, I am using hash object with arrays and I am using a modified formula for calculating the average since I want to leave out the maximum daily value in each 12-month period. Also there are some conditions such as discarding the value when the average is calculated from less than 200 daily observations. 

Now what I am trying to do is to simply expand this to multiple variables (Posts cited above use only a single variable).

 

I would also like to know if this could be done using macro variables as I have dozens of variables (note: the sample code below only contains two variables "aaaa" and "bbbb" for simplicity). I just want to find a way to simply repeating the code for each variables. 

 

 

 

data want;
	if _n_=1 then do;
		if 0 then set have(rename=(aaaa=_aaaa bbbb=_bbbb));
		declare hash h(multidata:'y'); h.definekey('date'); h.definedata('_aaaa', '_bbbb'); h.definedone();
	end;
	array x x1-x2 _temporary_;
	do until(last.id_var);
		set have;
		by id_var;
		_aaaa = aaaa; _bbbb = bbbb; h.add();
	end;
	do until(last.id_var);
		set have;
		by id_var date_m;
		n=0; call missing(of x{*});

		if last.date_m then do;
			do i=intnx('month', date_m, -12, 'b') to intnx('month', date_m, -1, 'e');
				rc=h.find(key:i);
				do while(rc=0);
				/* This part is mostly confusing. It works with single variable "aaaa". But I don't know how to change this for multiple variables. Is there a way without simply repeating this part? */
					if not missing(_aaaa) then do; n+1; x1{n}=_aaaa; end;
					if not missing(_bbbb) then do; n+1; x2{n}=_bbbb; end;
					rc=h.find_next(key:i);
				end;
			end;

			aaaa_12m = ( sum(of x1{*}) - max(of x1{*}) ) / ( n-1 ) ;   * delete the highest daily value in each 12-month period;
			bbbb_12m = ( sum(of x2{*}) - max(of x2{*}) ) / ( n-1 ) ;   * delete the highest daily value in each 12-month period;
			if n<200 then do; aaaa_12m =.; bbbb_12m =.; end;
			output;
		end;
	end;
	h.clear();
	drop i _aaaa _bbbb rc date aaaa bbbb n;
run; 

 

 

Thank you so much! 

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

It'd be really useful t have some test data so we can confirm our code matches your expected results.

Also congrats on well formatted code! (except for the tabs of course, they break alignment in other editors; spaces guarantee alignment).

ChrisNZ
Tourmaline | Level 20

It seems you want this:

data HAVE;
  input ID DATE_M date9. VAR1 VAR2;
  format DATE_M date9.;
cards;
1 01jan2002 1 11
1 01feb2002 2 22
1 01mar2002 3 33
1 01apr2002 3 33
1 01may2002 . 33
1 01jun2002 3 33
;
data WANT;
  if _N_=1 then do;
    declare hash H(dataset:'HAVE(rename=(DATE_M=DATE))', multidata:'y'); h.definekey('DATE'); h.definedata('VAR1', 'VAR2'); h.definedone();
  end;
  array _SUM[2] ;
  array _MAX[2] ;
  array _N[2]   ;
  do until(last.ID);
    set HAVE(drop=VAR1 VAR2);
    by ID DATE_M;
    call missing(of _: );
    if last.DATE_M then do;
      do DATE=intnx('month', DATE_M, -12, 'b') to intnx('month', DATE_M, -1, 'e');
        RC=H.find();
        if RC then continue;
        _SUM[1]+VAR1 ;
        _SUM[2]+VAR2 ;
        _MAX[1]=max(_MAX[1],VAR1) ;
        _MAX[2]=max(_MAX[2],VAR2) ;
        _N[1]+(^missing(VAR1)); 
        _N[2]+(^missing(VAR2)); 
      end;

      VAR1_12M = divide( _SUM[1]-_MAX[1] , ifn(_N[1]<0.200,0,_N[1]-1) ) ;   * delete the highest daily value in each 12-month period;
      VAR2_12M = divide( _SUM[2]-_MAX[2] , ifn(_N[2]<0.200,0,_N[2]-1) ) ;   * delete the highest daily value in each 12-month period;
      output;
    end;
  end;
  H.clear();
  drop DATE _: RC VAR1 VAR2 ;
proc print;run; 
 
ID DATE_M VAR1_12M VAR2_12M
1 01JAN2002 . .
1 01FEB2002 . .
1 01MAR2002 1.0 11.00
1 01APR2002 1.5 16.50
1 01MAY2002 2.0 22.00
1 01JUN2002 2.0 24.75
MadQuant
Obsidian | Level 7

Thanks for the tab advice. I prefer tabs but I shall use spaces when I post codes. 

 

For the code, I think this is it!

The biggest difference is the way the array is used. And I guess the reason you put 0.200 in ifn function is due to the observation number.

I should be able to expand this to many more variables. 

Thanks for the help! Appreciate it. 

 

A follow-up: How should I modify this if I want to get, say, standard deviation, using the STD function?

ChrisNZ
Tourmaline | Level 20

Unlike max, std requires all values to be known at once. so the easy way would be to fill an array with all the values.

MadQuant
Obsidian | Level 7

You're right. So I could use my original code to do that for a single variable. The problem is I don't see how I could count "n" for each of the variables if I use more than one. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1130 views
  • 0 likes
  • 2 in conversation