BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JKCho
Pyrite | Level 9
proc sql;
   create table capm3 as
   select *, 
          (select mean(rmrf) from capm2 
             where cusip=a.cusip
               and (intnx('year', a.Date, -30, 'b') le Date le a.Date)) 
                 as riskpremium format=dollar8.2
   from capm2 as a;
quit;

Hello all,

I am working on dealing with a big number of observations more than 40mililions. 

What I now want to do is to calculate the past 30-year annual average of stock returns. 

Conceptually, I have to calculate moving averages of the last 30-year period of each observation and then have to annualize each of them. Before doing the annualization part, I am facing the big challenge of calculating moving average part.

What I have now are monthly stock returns. To be as doing correctly as possible, I should use 

(intnx('month', a.Date, -359, 'b') le Date le a.Date))

 instead of using 'year' and -30. 

 

However, what I rather would like to ask for your help is not this part. Since I am working on more than 40mils observations, the code I wrote above takes long hours to process, so far more than 5 hours but still processing....

 

I think this is not the right way to handle it and have been thinking of something else.

A very similar way and conceptually the same way I first did is like using PROC SQL and identifying the interval like 

 

proc sql;
create table WANT as
select *,(select mean(rmfr) from HAVE where YYMM between a.YYMM-360 and a.fyear-1 and gvkey=a.gvkey) as WHATIWANT
from HAVE as a;
quit;

 

Yeah.. you know this took a long long long time as well.

It seems that it is common in finance and accounting studies to calculate the so-called "market risk premium" using the difference between the risk-free rate and whatever marker rate over the last 20 or 30 years before the time of observation, so I guess there must be a proper way to do so other than what I have tried so far.

 

Hope kind ones of you share your wisdom with a rookie student.

 

Thank you 🙂 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation

 

data test;
   input year qtr x;
   date = yyq( year, qtr );
   format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
   id date;
   convert x = x_movave / transformout=(movave 5);
run;
proc sql;
	create table want as select a.*,b.x_movave
		from out(drop=x_movave) as a left join out(keep=date x_movave) as b
		on a.date=intnx('quarter',b.date,1,'b')
	order by a.date;
quit;

 

 

--
Paige Miller

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

Sub-selects in SAS SQL are notorious for bad performance.

If you supply example data in usable form (data steps with datalines!) for your two datasets, so we can get a picture and have something to test against, we will be able to suggest alternatives.

JKCho
Pyrite | Level 9

Thank you for your kind of helping me.

I just made up an example of data just to help you understand what the dataset I am working on. What I want to do with this is to make average MONTHLY returns of each ID over the last 30 years.

For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.

 

Thank you for your kindness again 🙂 

 

 

 

data person;
   input ID $ return $ YYMM;
   datalines;                      
1  0.0151  198801
2  0.0946  198801
3  0.0765  198801
1  0.0011  198802 
2  0.0021  198802  
3  0.0031  198802
1  0.0153  198803
2  0.0054  198803
3  0.0855  198803

 

PaigeMiller
Diamond | Level 26

For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.

I'm not sure what this has to do with calculating moving average. Or is it a separate problem?

--
Paige Miller
JKCho
Pyrite | Level 9
That is just an example of a moving average that I want to calculate.
Like an average at the point of that time, I need to calculate other averages of different times for different IDs, so moving averages.
Wish this can clarify better.
PaigeMiller
Diamond | Level 26

For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation

 

data test;
   input year qtr x;
   date = yyq( year, qtr );
   format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
   id date;
   convert x = x_movave / transformout=(movave 5);
run;
proc sql;
	create table want as select a.*,b.x_movave
		from out(drop=x_movave) as a left join out(keep=date x_movave) as b
		on a.date=intnx('quarter',b.date,1,'b')
	order by a.date;
quit;

 

 

--
Paige Miller
Kurt_Bremser
Super User

In your question, you used another dataset (capm2) with completely different variables (rmrf, cusip). Please supply a sufficient example for this dataset, and the result you expect out of this data.

PaigeMiller
Diamond | Level 26

PROC EXPAND produces moving averages. If your SAS license does not include PROC EXPAND which is in SAS/ETS, there are DATA step methods of obtaining the same result.

--
Paige Miller
JKCho
Pyrite | Level 9
Thank you for your reply, Paige,

PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...
PaigeMiller
Diamond | Level 26

@JKCho wrote:
Thank you for your reply, Paige,

PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...

If you want the moving average of time t-1 to t-n, then you use PROC EXPAND and then re-arrange the data set so that the moving averages are moved one row downwards. Now, time t has the average of times t-1 to t-n.

--
Paige Miller
JKCho
Pyrite | Level 9
Sorry, I do not think I understand what you want to deliver. Can you explain more on how lagged MAs can do with t-1 to t-n? To me, you look like saying something like a mean from t-1 to t-12 and another mean from t to t-11, but could not understand your message.
JKCho
Pyrite | Level 9
Seems you meant, by its face value, to put a MA calculated from using data from t-1 to t-n at a row of time T. Oh... I did get it now. Thx. That is an easy trick. Thx!!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3493 views
  • 3 likes
  • 3 in conversation