Hello All,
I would like to create a variable that, for each day, creates a variable that represents the following 30 days of returns. The data takes the following form:
Firm Date Ret
A 1/1/11 .1
A 1/2/11 .01
A 1/3/11 .1
A 1/4/11 .2
.
.
.
.
B 1/1/11 .05
B 1/2/11 .1
.
.
.
So the idea is for the 1/1/11 observation for Firm A, the variable would return the cumulative return, starting that day, and going forward one day. Just to make it simple, we could use addition, rather than a compounding return: .1 + .01 + .1 + .2 and so on for thirty days. And for the 1/2/11 observation for Firm A, it would be: .01 + .1 + .2 and so on for thirty more days.
And this would continue with all the firms that I have.
Any help would be greatly appreciated. I am going to be downloading a small file for you guys to play with if you need data to work out the problem.
Thanks!
John
Try this slight modification:
proc sql;
create table want as
select
a.firm,
a.date,
sum(b.ret) as sum_ret,
exp(sum(log(1+b.ret)))-1 as compo_ret ,
count(b.ret) as n
from have as a
left join have as b
on a.firm=b.firm and b.date-a.date between 0 and 30
group by a.firm, a.date;
quit;
You could also add the clause having n > 30 at the end of the query to keep conplete periods only.
PG
Obvious question: What should happen to the later records for each firm, where there are less than 30 subsequent days of data?
Should work, but requires a merge:
Data A (Keep=Firm Date R);
Do Firm_Nr=1 To 3;
Do Time=0 To 50;
Firm=Byte(Firm_Nr+64);
Date=IntNX('day',"01JAN2000"d,Time);
R=Normal(1)*0.02+0.03;
Output;
End;
End;
Format Date Date9.;
Run;
%Let Periods=30;
Data B;
Retain Total;
Set A;
Date=IntNX('day',Date,-&Periods.);
By Firm;
If First.Firm Then Do;
Total=0;
Count=0;
End;
Count+1;
Drop_Obs=Lag&Periods.(R);
If Count gt &Periods Then Total=Sum(Total,R,-Drop_Obs);
Else Total=Sum(Total,R);
Run;
Data C;
Merge A (in=inA) B (Keep=Firm Date Total in=inB);
By Firm Date;
If inA;
Run;
A SQL self join is a quick way, this will return an invalid number if you have less than 30 days of records. You could add a count to keep track of how many records and either eliminate them or set them to missing in the same query if you wanted.
(untested)
proc sql;
create table want as
select a.firm, a.date, sum(b.ret) as cum_ret
from have as a
left join have as b
on b.date-a.date between 0 and 30
order by a.firm, a.date;
quit;
Thank you, as always, for your help. For some reason, the code that you wrote is "hanging," meaning that I submit it and then it just sits there, without giving me an error. I don't really know what is going on, but I really like the idea behind your code, as it seems very intuitive to me.
Do you have any idea why this may be?
Thanks,
John
Try this slight modification:
proc sql;
create table want as
select
a.firm,
a.date,
sum(b.ret) as sum_ret,
exp(sum(log(1+b.ret)))-1 as compo_ret ,
count(b.ret) as n
from have as a
left join have as b
on a.firm=b.firm and b.date-a.date between 0 and 30
group by a.firm, a.date;
quit;
You could also add the clause having n > 30 at the end of the query to keep conplete periods only.
PG
No! that was a typo. I corrected it.
PG
John,
Let us have a small number of dates. There are 6 days for A and
4 for B. It can be made to work with 30 days or for any number of days.
Just change the macro variable, &days, to your context. Also, you need
to give the expected maximum number of dates. For instance, in the
following data set, Firm A has the maximum dates(6).
An Array is used to save the RET values for a Firm at a time to
build up the cumulative sum of RETs. To be efficient, the sums
beyond &days, are made in such a way that one subtraction and
one addition are made for every day thereafter.
As sums are cumulatively made and when the number of observations
forward are lesser than &days, partial sums are made, which you can take
or ignore. Once a Firm is finished, the next Firm is processed in
a DoW-loop.
data have;
informat Date mmddyy8.;
format Date date10.;
input Firm $ Date Ret;
datalines;
A 1/1/11 .1
A 1/2/11 .01
A 1/3/11 .1
A 1/4/11 .2
A 1/5/11 .3
A 1/6/11 .2
B 1/1/11 .05
B 1/2/11 .1
B 1/3/11 .2
B 1/4/11 .1
;
run;
proc sort data = have;
by Firm ;
run;
%let days = 3;
%let maxdates = 6;
data need;
array k[&maxdates] _temporary_;
do ind = 1 by 1 until(last.Firm);
set have;
by Firm;
k[ind] = Ret;
end;
sum_Ret = 0;
* Find first sum for &days ;
do ind = 1 to &days;
sum_Ret + k[ind];
end;
do ind = 1 by 1 until(last.Firm);
set have;
by Firm;
if ind = 1 then output;
* Find subsequent sums by adjusting SUM_RET (removing and adding RET);
else do;
sum_Ret +- k[ind - 1];
if ind <= (dim(k) - &days + 1) then sum_Ret ++ k[ind + &days - 1];
output;
end;
end;
call missing(of k
drop ind;
run;
Muthia Kachirayan
I have been playing with the code that you each posted and have still not been able to make it work. I have gone back and attached an exerpt of the dataset that I am working with. The entire dataset is large, something like 700MB and has 11mm observations.
When I run the following code (obviously calling my own libraries, etc) the code just sort of sits there and does nothing.
proc sql;
create table mylib.twt2_2 as
select t1.CUSIP,
t1.DATE,
sum(t2.RET) as cum_ret,
exp(sum(log(1+t2.RET)))-1 as compo_ret,
count(t2.RET) as n
from mylib.twt2_1 as t1
left join mylib.twt2_1 as t2
on t2.date-t1.date between 0 and 30 & (t1.CUSIP = t2.CUSIP)
group by t1.CUSIP, t1.date;
quit;
Could you guys have a look at the dataset that I have attached and see what may be going wrong?
Thanks!!
I do not see your data.
By the way did you try the ARRAY approach I posted. If you have problems to understand/use the ARRAY approach, I can help you as much as I can.
What version of SAS are you using?
I am not sure what is your BYVar in TEST1. I am assuming it is STOCK_SYMBOL. If it is not you may replace it by your choice. Here is the Array way. There are 2265 rows with 'A' which is more than the next symbol 'ABI'.
I have not checked for the gaps in the dates. It is better you pre-process the data set and fill the gaps with suitable value for RET before running this program.
%let days = 30;
%let maxdates = 2265;
data need;
array k[&maxdates] _temporary_;
do ind = 1 by 1 until(last.stock_symbol);
set test1;
by stock_symbol;
k[ind] = Ret;
end;
sum_Ret = 0;
* Find first sum for &days ;
do ind = 1 to &days;
sum_Ret + k[ind];
end;
do ind = 1 by 1 until(last.stock_symbol);
set test1;
by stock_symbol;
if ind = 1 then output;
* Find subsequent sums by adjusting SUM_RET (removing and adding RET);
else do;
sum_Ret +- k[ind - 1];
if ind <= (dim(k) - &days + 1) then sum_Ret ++ k[ind + &days - 1];
output;
end;
end;
call missing(of k
drop ind;
run;
proc print data = need;
run;
Hope this works for you.
I notice that your dataset only includes weekdays. So a 30 day interval will include less than 30 datapoints. It is not clear how you want to handle that.
Otherwise, the only thing being wrong is your dataset is its size!
If you have access to SAS/ETS, you should try proc expand to get what you want:
/* Simple sum of returns */
proc expand data=sasforum.test1 out=test1;
by permno notsorted;
id date;
convert ret=ret_30 / transformout=(reverse movsum 30 reverse);
run;
/* Compounded product of returns */
proc expand data=sasforum.test1 out=test2;
by permno notsorted;
id date;
convert ret=compo_ret_30 / transformout=(+1 log reverse movsum 30 reverse exp -1);
run;
it should be a lot faster than SQL.
(Tested with your example dataset)
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.