BookmarkSubscribeRSS Feed
MadQuant
Obsidian | Level 7

Hi Experts, 

I would like to calculate a rolling average (e.g., 12-month window using daily obs at the end of every month) of multiple variables. So, the end data would have monthly observations that are 12-month averages of daily observations. 

 

First of all, I have learned a lot from some early posts such as

https://communities.sas.com/t5/SAS-Programming/calculating-rolling-standard-deviation-of-stock-retur... and

https://communities.sas.com/t5/SAS-Programming/Standard-deviation-on-rolling-basis/m-p/446482).

Most of my work is based on these posts. 

 

As you could see from my sample code below, I am using hash object with arrays and I am using a modified formula for calculating the average since I want to leave out the maximum daily value in each 12-month period. Also there are some conditions such as discarding the value when the average is calculated from less than 200 daily observations. 

Now what I am trying to do is to simply expand this to multiple variables (Posts cited above use only a single variable).

 

I would also like to know if this could be done using macro variables as I have dozens of variables (note: the sample code below only contains two variables "aaaa" and "bbbb" for simplicity). I just want to find a way to simply repeating the code for each variables. 

 

 

 

data want;
	if _n_=1 then do;
		if 0 then set have(rename=(aaaa=_aaaa bbbb=_bbbb));
		declare hash h(multidata:'y'); h.definekey('date'); h.definedata('_aaaa', '_bbbb'); h.definedone();
	end;
	array x x1-x2 _temporary_;
	do until(last.id_var);
		set have;
		by id_var;
		_aaaa = aaaa; _bbbb = bbbb; h.add();
	end;
	do until(last.id_var);
		set have;
		by id_var date_m;
		n=0; call missing(of x{*});

		if last.date_m then do;
			do i=intnx('month', date_m, -12, 'b') to intnx('month', date_m, -1, 'e');
				rc=h.find(key:i);
				do while(rc=0);
				/* This part is mostly confusing. It works with single variable "aaaa". But I don't know how to change this for multiple variables. Is there a way without simply repeating this part? */
					if not missing(_aaaa) then do; n+1; x1{n}=_aaaa; end;
					if not missing(_bbbb) then do; n+1; x2{n}=_bbbb; end;
					rc=h.find_next(key:i);
				end;
			end;

			aaaa_12m = ( sum(of x1{*}) - max(of x1{*}) ) / ( n-1 ) ;   * delete the highest daily value in each 12-month period;
			bbbb_12m = ( sum(of x2{*}) - max(of x2{*}) ) / ( n-1 ) ;   * delete the highest daily value in each 12-month period;
			if n<200 then do; aaaa_12m =.; bbbb_12m =.; end;
			output;
		end;
	end;
	h.clear();
	drop i _aaaa _bbbb rc date aaaa bbbb n;
run; 

 

 

Thank you so much! 

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

It'd be really useful t have some test data so we can confirm our code matches your expected results.

Also congrats on well formatted code! (except for the tabs of course, they break alignment in other editors; spaces guarantee alignment).

ChrisNZ
Tourmaline | Level 20

It seems you want this:

data HAVE;
  input ID DATE_M date9. VAR1 VAR2;
  format DATE_M date9.;
cards;
1 01jan2002 1 11
1 01feb2002 2 22
1 01mar2002 3 33
1 01apr2002 3 33
1 01may2002 . 33
1 01jun2002 3 33
;
data WANT;
  if _N_=1 then do;
    declare hash H(dataset:'HAVE(rename=(DATE_M=DATE))', multidata:'y'); h.definekey('DATE'); h.definedata('VAR1', 'VAR2'); h.definedone();
  end;
  array _SUM[2] ;
  array _MAX[2] ;
  array _N[2]   ;
  do until(last.ID);
    set HAVE(drop=VAR1 VAR2);
    by ID DATE_M;
    call missing(of _: );
    if last.DATE_M then do;
      do DATE=intnx('month', DATE_M, -12, 'b') to intnx('month', DATE_M, -1, 'e');
        RC=H.find();
        if RC then continue;
        _SUM[1]+VAR1 ;
        _SUM[2]+VAR2 ;
        _MAX[1]=max(_MAX[1],VAR1) ;
        _MAX[2]=max(_MAX[2],VAR2) ;
        _N[1]+(^missing(VAR1)); 
        _N[2]+(^missing(VAR2)); 
      end;

      VAR1_12M = divide( _SUM[1]-_MAX[1] , ifn(_N[1]<0.200,0,_N[1]-1) ) ;   * delete the highest daily value in each 12-month period;
      VAR2_12M = divide( _SUM[2]-_MAX[2] , ifn(_N[2]<0.200,0,_N[2]-1) ) ;   * delete the highest daily value in each 12-month period;
      output;
    end;
  end;
  H.clear();
  drop DATE _: RC VAR1 VAR2 ;
proc print;run; 
 
ID DATE_M VAR1_12M VAR2_12M
1 01JAN2002 . .
1 01FEB2002 . .
1 01MAR2002 1.0 11.00
1 01APR2002 1.5 16.50
1 01MAY2002 2.0 22.00
1 01JUN2002 2.0 24.75
MadQuant
Obsidian | Level 7

Thanks for the tab advice. I prefer tabs but I shall use spaces when I post codes. 

 

For the code, I think this is it!

The biggest difference is the way the array is used. And I guess the reason you put 0.200 in ifn function is due to the observation number.

I should be able to expand this to many more variables. 

Thanks for the help! Appreciate it. 

 

A follow-up: How should I modify this if I want to get, say, standard deviation, using the STD function?

ChrisNZ
Tourmaline | Level 20

Unlike max, std requires all values to be known at once. so the easy way would be to fill an array with all the values.

MadQuant
Obsidian | Level 7

You're right. So I could use my original code to do that for a single variable. The problem is I don't see how I could count "n" for each of the variables if I use more than one. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1045 views
  • 0 likes
  • 2 in conversation