Solved: looking for efficient codes for calculating moving average.

JKCho · Posted 11-18-2020 07:05 AM

proc sql;
   create table capm3 as
   select *, 
          (select mean(rmrf) from capm2 
             where cusip=a.cusip
               and (intnx('year', a.Date, -30, 'b') le Date le a.Date)) 
                 as riskpremium format=dollar8.2
   from capm2 as a;
quit;

Hello all,

I am working on dealing with a big number of observations more than 40mililions.

What I now want to do is to calculate the past 30-year annual average of stock returns.

Conceptually, I have to calculate moving averages of the last 30-year period of each observation and then have to annualize each of them. Before doing the annualization part, I am facing the big challenge of calculating moving average part.

What I have now are monthly stock returns. To be as doing correctly as possible, I should use

(intnx('month', a.Date, -359, 'b') le Date le a.Date))

instead of using 'year' and -30.

However, what I rather would like to ask for your help is not this part. Since I am working on more than 40mils observations, the code I wrote above takes long hours to process, so far more than 5 hours but still processing....

I think this is not the right way to handle it and have been thinking of something else.

A very similar way and conceptually the same way I first did is like using PROC SQL and identifying the interval like

proc sql;
create table WANT as
select *,(select mean(rmfr) from HAVE where YYMM between a.YYMM-360 and a.fyear-1 and gvkey=a.gvkey) as WHATIWANT
from HAVE as a;
quit;

Yeah.. you know this took a long long long time as well.

It seems that it is common in finance and accounting studies to calculate the so-called "market risk premium" using the difference between the risk-free rate and whatever marker rate over the last 20 or 30 years before the time of observation, so I guess there must be a proper way to do so other than what I have tried so far.

Hope kind ones of you share your wisdom with a rookie student.

Thank you 🙂

PaigeMiller · Posted 11-18-2020 08:07 AM

For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation

data test;
   input year qtr x;
   date = yyq( year, qtr );
   format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
   id date;
   convert x = x_movave / transformout=(movave 5);
run;
proc sql;
	create table want as select a.*,b.x_movave
		from out(drop=x_movave) as a left join out(keep=date x_movave) as b
		on a.date=intnx('quarter',b.date,1,'b')
	order by a.date;
quit;

--
Paige Miller

View solution in original post

Kurt_Bremser · Posted 11-18-2020 07:16 AM

Sub-selects in SAS SQL are notorious for bad performance.

If you supply example data in usable form (data steps with datalines!) for your two datasets, so we can get a picture and have something to test against, we will be able to suggest alternatives.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

JKCho · Posted 11-18-2020 07:44 AM

Thank you for your kind of helping me.

I just made up an example of data just to help you understand what the dataset I am working on. What I want to do with this is to make average MONTHLY returns of each ID over the last 30 years.

For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.

Thank you for your kindness again 🙂

data person;
   input ID $ return $ YYMM;
   datalines;                      
1  0.0151  198801
2  0.0946  198801
3  0.0765  198801
1  0.0011  198802 
2  0.0021  198802  
3  0.0031  198802
1  0.0153  198803
2  0.0054  198803
3  0.0855  198803

PaigeMiller · Posted 11-18-2020 07:48 AM

For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.

I'm not sure what this has to do with calculating moving average. Or is it a separate problem?

--
Paige Miller

JKCho · Posted 11-18-2020 07:54 AM

That is just an example of a moving average that I want to calculate.
Like an average at the point of that time, I need to calculate other averages of different times for different IDs, so moving averages.
Wish this can clarify better.

PaigeMiller · Posted 11-18-2020 08:07 AM

For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation

data test;
   input year qtr x;
   date = yyq( year, qtr );
   format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
   id date;
   convert x = x_movave / transformout=(movave 5);
run;
proc sql;
	create table want as select a.*,b.x_movave
		from out(drop=x_movave) as a left join out(keep=date x_movave) as b
		on a.date=intnx('quarter',b.date,1,'b')
	order by a.date;
quit;

--
Paige Miller

Kurt_Bremser · Posted 11-18-2020 08:06 AM

In your question, you used another dataset (capm2) with completely different variables (rmrf, cusip). Please supply a sufficient example for this dataset, and the result you expect out of this data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 11-18-2020 07:23 AM

PROC EXPAND produces moving averages. If your SAS license does not include PROC EXPAND which is in SAS/ETS, there are DATA step methods of obtaining the same result.

--
Paige Miller

JKCho · Posted 11-18-2020 07:26 AM

Thank you for your reply, Paige,

PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...

PaigeMiller · Posted 11-18-2020 07:38 AM

@JKCho wrote:
Thank you for your reply, Paige,

PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...

If you want the moving average of time t-1 to t-n, then you use PROC EXPAND and then re-arrange the data set so that the moving averages are moved one row downwards. Now, time t has the average of times t-1 to t-n.

--
Paige Miller

JKCho · Posted 11-18-2020 08:00 AM

Sorry, I do not think I understand what you want to deliver. Can you explain more on how lagged MAs can do with t-1 to t-n? To me, you look like saying something like a mean from t-1 to t-12 and another mean from t to t-11, but could not understand your message.

JKCho · Posted 11-18-2020 08:06 AM

Seems you meant, by its face value, to put a MA calculated from using data from t-1 to t-n at a row of time T. Oh... I did get it now. Thx. That is an easy trick. Thx!!

looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Re: looking for efficient codes for calculating moving average.

Registration is open

SAS Training: Just a Click Away