SAS Programming

MadQuant · Posted 04-05-2022 06:36 PM

Hi Experts,

I would like to calculate a rolling average (e.g., 12-month window using daily obs at the end of every month) of multiple variables. So, the end data would have monthly observations that are 12-month averages of daily observations.

First of all, I have learned a lot from some early posts such as

https://communities.sas.com/t5/SAS-Programming/calculating-rolling-standard-deviation-of-stock-retur... and

https://communities.sas.com/t5/SAS-Programming/Standard-deviation-on-rolling-basis/m-p/446482).

Most of my work is based on these posts.

As you could see from my sample code below, I am using hash object with arrays and I am using a modified formula for calculating the average since I want to leave out the maximum daily value in each 12-month period. Also there are some conditions such as discarding the value when the average is calculated from less than 200 daily observations.

Now what I am trying to do is to simply expand this to multiple variables (Posts cited above use only a single variable).

I would also like to know if this could be done using macro variables as I have dozens of variables (note: the sample code below only contains two variables "aaaa" and "bbbb" for simplicity). I just want to find a way to simply repeating the code for each variables.

data want;
	if _n_=1 then do;
		if 0 then set have(rename=(aaaa=_aaaa bbbb=_bbbb));
		declare hash h(multidata:'y'); h.definekey('date'); h.definedata('_aaaa', '_bbbb'); h.definedone();
	end;
	array x x1-x2 _temporary_;
	do until(last.id_var);
		set have;
		by id_var;
		_aaaa = aaaa; _bbbb = bbbb; h.add();
	end;
	do until(last.id_var);
		set have;
		by id_var date_m;
		n=0; call missing(of x{*});

		if last.date_m then do;
			do i=intnx('month', date_m, -12, 'b') to intnx('month', date_m, -1, 'e');
				rc=h.find(key:i);
				do while(rc=0);
				/* This part is mostly confusing. It works with single variable "aaaa". But I don't know how to change this for multiple variables. Is there a way without simply repeating this part? */
					if not missing(_aaaa) then do; n+1; x1{n}=_aaaa; end;
					if not missing(_bbbb) then do; n+1; x2{n}=_bbbb; end;
					rc=h.find_next(key:i);
				end;
			end;

			aaaa_12m = ( sum(of x1{*}) - max(of x1{*}) ) / ( n-1 ) ;   * delete the highest daily value in each 12-month period;
			bbbb_12m = ( sum(of x2{*}) - max(of x2{*}) ) / ( n-1 ) ;   * delete the highest daily value in each 12-month period;
			if n<200 then do; aaaa_12m =.; bbbb_12m =.; end;
			output;
		end;
	end;
	h.clear();
	drop i _aaaa _bbbb rc date aaaa bbbb n;
run;

Thank you so much!

ChrisNZ · Posted 04-06-2022 02:23 AM

It'd be really useful t have some test data so we can confirm our code matches your expected results.

Also congrats on well formatted code! (except for the tabs of course, they break alignment in other editors; spaces guarantee alignment).

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 04-06-2022 03:13 AM

It seems you want this:

data HAVE;
  input ID DATE_M date9. VAR1 VAR2;
  format DATE_M date9.;
cards;
1 01jan2002 1 11
1 01feb2002 2 22
1 01mar2002 3 33
1 01apr2002 3 33
1 01may2002 . 33
1 01jun2002 3 33
;
data WANT;
  if _N_=1 then do;
    declare hash H(dataset:'HAVE(rename=(DATE_M=DATE))', multidata:'y'); h.definekey('DATE'); h.definedata('VAR1', 'VAR2'); h.definedone();
  end;
  array _SUM[2] ;
  array _MAX[2] ;
  array _N[2]   ;
  do until(last.ID);
    set HAVE(drop=VAR1 VAR2);
    by ID DATE_M;
    call missing(of _: );
    if last.DATE_M then do;
      do DATE=intnx('month', DATE_M, -12, 'b') to intnx('month', DATE_M, -1, 'e');
        RC=H.find();
        if RC then continue;
        _SUM[1]+VAR1 ;
        _SUM[2]+VAR2 ;
        _MAX[1]=max(_MAX[1],VAR1) ;
        _MAX[2]=max(_MAX[2],VAR2) ;
        _N[1]+(^missing(VAR1)); 
        _N[2]+(^missing(VAR2)); 
      end;

      VAR1_12M = divide( _SUM[1]-_MAX[1] , ifn(_N[1]<0.200,0,_N[1]-1) ) ;   * delete the highest daily value in each 12-month period;
      VAR2_12M = divide( _SUM[2]-_MAX[2] , ifn(_N[2]<0.200,0,_N[2]-1) ) ;   * delete the highest daily value in each 12-month period;
      output;
    end;
  end;
  H.clear();
  drop DATE _: RC VAR1 VAR2 ;
proc print;run;


ID	DATE_M	VAR1_12M	VAR2_12M
1	01JAN2002	.	.
1	01FEB2002	.	.
1	01MAR2002	1.0	11.00
1	01APR2002	1.5	16.50
1	01MAY2002	2.0	22.00
1	01JUN2002	2.0	24.75

High-Performance SAS Coding - Third Edition

MadQuant · Posted 04-07-2022 11:22 PM

Thanks for the tab advice. I prefer tabs but I shall use spaces when I post codes.

For the code, I think this is it!

The biggest difference is the way the array is used. And I guess the reason you put 0.200 in ifn function is due to the observation number.

I should be able to expand this to many more variables.

Thanks for the help! Appreciate it.

A follow-up: How should I modify this if I want to get, say, standard deviation, using the STD function?

ChrisNZ · Posted 04-11-2022 08:03 PM

Unlike max, std requires all values to be known at once. so the easy way would be to fill an array with all the values.

High-Performance SAS Coding - Third Edition

MadQuant · Posted 04-14-2022 09:00 PM

You're right. So I could use my original code to do that for a single variable. The problem is I don't see how I could count "n" for each of the variables if I use more than one.

SAS Programming

Rolling average (using own formula) of multiple variables

Re: Rolling average (using own formula) of multiple variables

Re: Rolling average (using own formula) of multiple variables

Re: Rolling average (using own formula) of multiple variables

Re: Rolling average (using own formula) of multiple variables

Re: Rolling average (using own formula) of multiple variables

Rolling Average by firm and analyst

Rolling dates using intnx

Applying multiple formulas located in a column to calculate rates

How to get rolling average

Filter data by multiple variables

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...