BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JKCho
Pyrite | Level 9
proc sql;
   create table capm3 as
   select *, 
          (select mean(rmrf) from capm2 
             where cusip=a.cusip
               and (intnx('year', a.Date, -30, 'b') le Date le a.Date)) 
                 as riskpremium format=dollar8.2
   from capm2 as a;
quit;

Hello all,

I am working on dealing with a big number of observations more than 40mililions. 

What I now want to do is to calculate the past 30-year annual average of stock returns. 

Conceptually, I have to calculate moving averages of the last 30-year period of each observation and then have to annualize each of them. Before doing the annualization part, I am facing the big challenge of calculating moving average part.

What I have now are monthly stock returns. To be as doing correctly as possible, I should use 

(intnx('month', a.Date, -359, 'b') le Date le a.Date))

 instead of using 'year' and -30. 

 

However, what I rather would like to ask for your help is not this part. Since I am working on more than 40mils observations, the code I wrote above takes long hours to process, so far more than 5 hours but still processing....

 

I think this is not the right way to handle it and have been thinking of something else.

A very similar way and conceptually the same way I first did is like using PROC SQL and identifying the interval like 

 

proc sql;
create table WANT as
select *,(select mean(rmfr) from HAVE where YYMM between a.YYMM-360 and a.fyear-1 and gvkey=a.gvkey) as WHATIWANT
from HAVE as a;
quit;

 

Yeah.. you know this took a long long long time as well.

It seems that it is common in finance and accounting studies to calculate the so-called "market risk premium" using the difference between the risk-free rate and whatever marker rate over the last 20 or 30 years before the time of observation, so I guess there must be a proper way to do so other than what I have tried so far.

 

Hope kind ones of you share your wisdom with a rookie student.

 

Thank you 🙂 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation

 

data test;
   input year qtr x;
   date = yyq( year, qtr );
   format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
   id date;
   convert x = x_movave / transformout=(movave 5);
run;
proc sql;
	create table want as select a.*,b.x_movave
		from out(drop=x_movave) as a left join out(keep=date x_movave) as b
		on a.date=intnx('quarter',b.date,1,'b')
	order by a.date;
quit;

 

 

--
Paige Miller

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

Sub-selects in SAS SQL are notorious for bad performance.

If you supply example data in usable form (data steps with datalines!) for your two datasets, so we can get a picture and have something to test against, we will be able to suggest alternatives.

JKCho
Pyrite | Level 9

Thank you for your kind of helping me.

I just made up an example of data just to help you understand what the dataset I am working on. What I want to do with this is to make average MONTHLY returns of each ID over the last 30 years.

For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.

 

Thank you for your kindness again 🙂 

 

 

 

data person;
   input ID $ return $ YYMM;
   datalines;                      
1  0.0151  198801
2  0.0946  198801
3  0.0765  198801
1  0.0011  198802 
2  0.0021  198802  
3  0.0031  198802
1  0.0153  198803
2  0.0054  198803
3  0.0855  198803

 

PaigeMiller
Diamond | Level 26

For example, if the point of observation is 1988/03 and ID is 1, then I want to calculate avg. MONTHLY return of ID 1 over the period from 1958/03 to 1988/02.

I'm not sure what this has to do with calculating moving average. Or is it a separate problem?

--
Paige Miller
JKCho
Pyrite | Level 9
That is just an example of a moving average that I want to calculate.
Like an average at the point of that time, I need to calculate other averages of different times for different IDs, so moving averages.
Wish this can clarify better.
PaigeMiller
Diamond | Level 26

For this example data, you could provide, at most, a moving average of 2, so the example is a poor fit to the problem. Nevertheless, here is some random data showing how to get the moving average you want, using the example in the PROC EXPAND documentation

 

data test;
   input year qtr x;
   date = yyq( year, qtr );
   format date yyqc.;
datalines;
1989 3 5238
1989 4 5289
1990 1 5375
1990 2 5443
1990 3 5514
1990 4 5527
1991 1 5557
1991 2 5615
;
proc expand data=test out=out method=none;
   id date;
   convert x = x_movave / transformout=(movave 5);
run;
proc sql;
	create table want as select a.*,b.x_movave
		from out(drop=x_movave) as a left join out(keep=date x_movave) as b
		on a.date=intnx('quarter',b.date,1,'b')
	order by a.date;
quit;

 

 

--
Paige Miller
Kurt_Bremser
Super User

In your question, you used another dataset (capm2) with completely different variables (rmrf, cusip). Please supply a sufficient example for this dataset, and the result you expect out of this data.

PaigeMiller
Diamond | Level 26

PROC EXPAND produces moving averages. If your SAS license does not include PROC EXPAND which is in SAS/ETS, there are DATA step methods of obtaining the same result.

--
Paige Miller
JKCho
Pyrite | Level 9
Thank you for your reply, Paige,

PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...
PaigeMiller
Diamond | Level 26

@JKCho wrote:
Thank you for your reply, Paige,

PROC EXPAND is a good alternative in most cases. The reason why I am not using it for this case is that PROC EXPAND cannot rule out current observations, t. I need a period from t-1 to t-n, which PROX EXPAND does not have while it has cmove, trimleft, or other good techniques...

If you want the moving average of time t-1 to t-n, then you use PROC EXPAND and then re-arrange the data set so that the moving averages are moved one row downwards. Now, time t has the average of times t-1 to t-n.

--
Paige Miller
JKCho
Pyrite | Level 9
Sorry, I do not think I understand what you want to deliver. Can you explain more on how lagged MAs can do with t-1 to t-n? To me, you look like saying something like a mean from t-1 to t-12 and another mean from t to t-11, but could not understand your message.
JKCho
Pyrite | Level 9
Seems you meant, by its face value, to put a MA calculated from using data from t-1 to t-n at a row of time T. Oh... I did get it now. Thx. That is an easy trick. Thx!!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3091 views
  • 3 likes
  • 3 in conversation