Solved: Re: Creating rolling 30-day returns variable

mahler_ji · Posted 09-23-2014 04:44 PM

Hello All,

I would like to create a variable that, for each day, creates a variable that represents the following 30 days of returns. The data takes the following form:

Firm Date Ret

A 1/1/11 .1

A 1/2/11 .01

A 1/3/11 .1

A 1/4/11 .2

.

B 1/1/11 .05

B 1/2/11 .1

.

So the idea is for the 1/1/11 observation for Firm A, the variable would return the cumulative return, starting that day, and going forward one day. Just to make it simple, we could use addition, rather than a compounding return: .1 + .01 + .1 + .2 and so on for thirty days. And for the 1/2/11 observation for Firm A, it would be: .01 + .1 + .2 and so on for thirty more days.

And this would continue with all the firms that I have.

Any help would be greatly appreciated. I am going to be downloading a small file for you guys to play with if you need data to work out the problem.

Thanks!

John

PGStats · Posted 09-27-2014 08:21 PM

Try this slight modification:

proc sql;

create table want as

select

a.firm,

a.date,

sum(b.ret) as sum_ret,

exp(sum(log(1+b.ret)))-1 as compo_ret ,

count(b.ret) as n

from have as a

left join have as b

on a.firm=b.firm and b.date-a.date between 0 and 30

group by a.firm, a.date;

quit;

You could also add the clause having n > 30 at the end of the query to keep conplete periods only.

PG

View solution in original post

dkb · Posted 09-23-2014 09:51 PM

Obvious question: What should happen to the later records for each firm, where there are less than 30 subsequent days of data?

mahler_ji · Posted 09-23-2014 09:55 PM

Hey , thanks for your question.

It doesn't really matter, because I am going to trim the data to get rid of those observations. So they can either return something wrong or return a "." Either one will do!

Any help would be amazing! I am having a hard time figuring this one out.

Thanks,

John

user24feb · Posted 09-24-2014 04:20 AM

Should work, but requires a merge:

Data A (Keep=Firm Date R);
Do Firm_Nr=1 To 3;
    Do Time=0 To 50;
      Firm=Byte(Firm_Nr+64);
   Date=IntNX('day',"01JAN2000"d,Time);
   R=Normal(1)*0.02+0.03;
      Output;
    End;
End;
Format Date Date9.;
Run;

%Let Periods=30;
Data B;
Retain Total;
Set A;
Date=IntNX('day',Date,-&Periods.);
By Firm;
If First.Firm Then Do;
Total=0;
Count=0;
End;
Count+1;
Drop_Obs=Lag&Periods.(R);
If Count gt &Periods Then Total=Sum(Total,R,-Drop_Obs);
Else Total=Sum(Total,R);
Run;

Data C;
Merge A (in=inA) B (Keep=Firm Date Total in=inB);
By Firm Date;
If inA;
Run;

Reeza · Posted 09-23-2014 10:37 PM

A SQL self join is a quick way, this will return an invalid number if you have less than 30 days of records. You could add a count to keep track of how many records and either eliminate them or set them to missing in the same query if you wanted.

(untested)

proc sql;

create table want as

select a.firm, a.date, sum(b.ret) as cum_ret

from have as a

left join have as b

on b.date-a.date between 0 and 30

order by a.firm, a.date;

quit;

mahler_ji · Posted 09-27-2014 08:04 PM

Hey ,

Thank you, as always, for your help. For some reason, the code that you wrote is "hanging," meaning that I submit it and then it just sits there, without giving me an error. I don't really know what is going on, but I really like the idea behind your code, as it seems very intuitive to me.

Do you have any idea why this may be?

Thanks,

John

PGStats · Posted 09-27-2014 08:21 PM

Try this slight modification:

proc sql;

create table want as

select

a.firm,

a.date,

sum(b.ret) as sum_ret,

exp(sum(log(1+b.ret)))-1 as compo_ret ,

count(b.ret) as n

from have as a

left join have as b

on a.firm=b.firm and b.date-a.date between 0 and 30

group by a.firm, a.date;

quit;

You could also add the clause having n > 30 at the end of the query to keep conplete periods only.

PG

mahler_ji · Posted 09-27-2014 08:37 PM

Hey ,

Thanks for your help, I am going to to run this code right now. I thought that the issue might have had something to do with the fact that I had to "firm" requirement in the on statement.

Just checking, should there be two "bys" in the group by clause? I have never seen that before!

Thanks,

John

PGStats · Posted 09-27-2014 09:40 PM

No! that was a typo. I corrected it.

PG

KachiM · Posted 09-25-2014 06:32 AM

John,

Let us have a small number of dates. There are 6 days for A and

4 for B. It can be made to work with 30 days or for any number of days.

Just change the macro variable, &days, to your context. Also, you need

to give the expected maximum number of dates. For instance, in the

following data set, Firm A has the maximum dates(6).

An Array is used to save the RET values for a Firm at a time to

build up the cumulative sum of RETs. To be efficient, the sums

beyond &days, are made in such a way that one subtraction and

one addition are made for every day thereafter.

As sums are cumulatively made and when the number of observations

forward are lesser than &days, partial sums are made, which you can take

or ignore. Once a Firm is finished, the next Firm is processed in

a DoW-loop.

data have;

informat Date mmddyy8.;

format Date date10.;

input Firm $ Date Ret;

datalines;

A 1/1/11 .1

A 1/2/11 .01

A 1/3/11 .1

A 1/4/11 .2

A 1/5/11 .3

A 1/6/11 .2

B 1/1/11 .05

B 1/2/11 .1

B 1/3/11 .2

B 1/4/11 .1

;

run;

proc sort data = have;

by Firm ;

run;

%let days = 3;

%let maxdates = 6;

data need;

array k[&maxdates] _temporary_;

do ind = 1 by 1 until(last.Firm);

set have;

by Firm;

k[ind] = Ret;

end;

sum_Ret = 0;

* Find first sum for &days ;

do ind = 1 to &days;

sum_Ret + k[ind];

end;

do ind = 1 by 1 until(last.Firm);

set have;

by Firm;

if ind = 1 then output;

* Find subsequent sums by adjusting SUM_RET (removing and adding RET);

else do;

sum_Ret +- k[ind - 1];

if ind <= (dim(k) - &days + 1) then sum_Ret ++ k[ind + &days - 1];

output;

end;

call missing(of k

);

drop ind;

run;

Muthia Kachirayan

mahler_ji · Posted 09-28-2014 08:50 AM

Hey and ,

I have been playing with the code that you each posted and have still not been able to make it work. I have gone back and attached an exerpt of the dataset that I am working with. The entire dataset is large, something like 700MB and has 11mm observations.

When I run the following code (obviously calling my own libraries, etc) the code just sort of sits there and does nothing.

proc sql;

create table mylib.twt2_2 as

select t1.CUSIP,

t1.DATE,

sum(t2.RET) as cum_ret,

exp(sum(log(1+t2.RET)))-1 as compo_ret,

count(t2.RET) as n

from mylib.twt2_1 as t1

left join mylib.twt2_1 as t2

on t2.date-t1.date between 0 and 30 & (t1.CUSIP = t2.CUSIP)

group by t1.CUSIP, t1.date;

quit;

Could you guys have a look at the dataset that I have attached and see what may be going wrong?

Thanks!!

KachiM · Posted 09-28-2014 10:11 AM

I do not see your data.

By the way did you try the ARRAY approach I posted. If you have problems to understand/use the ARRAY approach, I can help you as much as I can.

Reeza · Posted 09-28-2014 12:30 PM

What version of SAS are you using?

KachiM · Posted 09-28-2014 01:24 PM

I am not sure what is your BYVar in TEST1. I am assuming it is STOCK_SYMBOL. If it is not you may replace it by your choice. Here is the Array way. There are 2265 rows with 'A' which is more than the next symbol 'ABI'.

I have not checked for the gaps in the dates. It is better you pre-process the data set and fill the gaps with suitable value for RET before running this program.

%let days = 30;

%let maxdates = 2265;

data need;

array k[&maxdates] _temporary_;

do ind = 1 by 1 until(last.stock_symbol);

set test1;

by stock_symbol;

k[ind] = Ret;

end;

sum_Ret = 0;

* Find first sum for &days ;

do ind = 1 to &days;

sum_Ret + k[ind];

end;

do ind = 1 by 1 until(last.stock_symbol);

set test1;

by stock_symbol;

if ind = 1 then output;

* Find subsequent sums by adjusting SUM_RET (removing and adding RET);

else do;

sum_Ret +- k[ind - 1];

if ind <= (dim(k) - &days + 1) then sum_Ret ++ k[ind + &days - 1];

output;

end;

call missing(of k

);

drop ind;

run;

proc print data = need;

run;

Hope this works for you.

PGStats · Posted 09-28-2014 05:49 PM

I notice that your dataset only includes weekdays. So a 30 day interval will include less than 30 datapoints. It is not clear how you want to handle that.

Otherwise, the only thing being wrong is your dataset is its size!

If you have access to SAS/ETS, you should try proc expand to get what you want:

/* Simple sum of returns */

proc expand data=sasforum.test1 out=test1;

by permno notsorted;

id date;

convert ret=ret_30 / transformout=(reverse movsum 30 reverse);

run;

/* Compounded product of returns */

proc expand data=sasforum.test1 out=test2;

by permno notsorted;

id date;

convert ret=compo_ret_30 / transformout=(+1 log reverse movsum 30 reverse exp -1);

run;

it should be a lot faster than SQL.

(Tested with your example dataset)

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!