proc sql;
create table capm3 as
select *,
(select mean(rmrf) from capm2
where cusip=a.cusip
and (intnx('year', a.Date, -30, 'b') le Date le a.Date))
as riskpremium format=dollar8.2
from capm2 as a;
quit;
Hello all,
I am working on dealing with a big number of observations more than 40mililions.
What I now want to do is to calculate the past 30-year annual average of stock returns.
Conceptually, I have to calculate moving averages of the last 30-year period of each observation and then have to annualize each of them. Before doing the annualization part, I am facing the big challenge of calculating moving average part.
What I have now are monthly stock returns. To be as doing correctly as possible, I should use
(intnx('month', a.Date, -359, 'b') le Date le a.Date))
instead of using 'year' and -30.
However, what I rather would like to ask for your help is not this part. Since I am working on more than 40mils observations, the code I wrote above takes long hours to process, so far more than 5 hours but still processing....
I think this is not the right way to handle it and have been thinking of something else.
A very similar way and conceptually the same way I first did is like using PROC SQL and identifying the interval like
proc sql;
create table WANT as
select *,(select mean(rmfr) from HAVE where YYMM between a.YYMM-360 and a.fyear-1 and gvkey=a.gvkey) as WHATIWANT
from HAVE as a;
quit;
Yeah.. you know this took a long long long time as well.
It seems that it is common in finance and accounting studies to calculate the so-called "market risk premium" using the difference between the risk-free rate and whatever marker rate over the last 20 or 30 years before the time of observation, so I guess there must be a proper way to do so other than what I have tried so far.
Hope kind ones of you share your wisdom with a rookie student.
Thank you 🙂
For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation
data test;
input year qtr x;
date = yyq( year, qtr );
format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
id date;
convert x = x_movave / transformout=(movave 5);
run;
proc sql;
create table want as select a.*,b.x_movave
from out(drop=x_movave) as a left join out(keep=date x_movave) as b
on a.date=intnx('quarter',b.date,1,'b')
order by a.date;
quit;
Sub-selects in SAS SQL are notorious for bad performance.
If you supply example data in usable form (data steps with datalines!) for your two datasets, so we can get a picture and have something to test against, we will be able to suggest alternatives.
Thank you for your kind of helping me.
I just made up an example of data just to help you understand what the dataset I am working on. What I want to do with this is to make average MONTHLY returns of each ID over the last 30 years.
For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.
Thank you for your kindness again 🙂
data person;
input ID $ return $ YYMM;
datalines;
1 0.0151 198801
2 0.0946 198801
3 0.0765 198801
1 0.0011 198802
2 0.0021 198802
3 0.0031 198802
1 0.0153 198803
2 0.0054 198803
3 0.0855 198803
For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.
I'm not sure what this has to do with calculating moving average. Or is it a separate problem?
For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation
data test;
input year qtr x;
date = yyq( year, qtr );
format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
id date;
convert x = x_movave / transformout=(movave 5);
run;
proc sql;
create table want as select a.*,b.x_movave
from out(drop=x_movave) as a left join out(keep=date x_movave) as b
on a.date=intnx('quarter',b.date,1,'b')
order by a.date;
quit;
In your question, you used another dataset (capm2) with completely different variables (rmrf, cusip). Please supply a sufficient example for this dataset, and the result you expect out of this data.
PROC EXPAND produces moving averages. If your SAS license does not include PROC EXPAND which is in SAS/ETS, there are DATA step methods of obtaining the same result.
@JKCho wrote:
Thank you for your reply, Paige,
PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...
If you want the moving average of time t-1 to t-n, then you use PROC EXPAND and then re-arrange the data set so that the moving averages are moved one row downwards. Now, time t has the average of times t-1 to t-n.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.