Hi Experts,
I have daily returns data for firms. I would like to calculate the standard deviation of the returns (say, sigma) from the daily observations for the last 3 months on a rolling basis. For example, sigma for for firm i in december 2017 will be the standard deviations of the daily returns from september to november 2017. Similarly, sigma for firm i in november 2017 will be the standard deviations of the daily returns from august to october 2017.
Thanks,
Abu
OK. If you want retain those multiple returns in the same PERMNO and the same Names_Date.
proc import datafile='c:\temp\daily_ret.txt' out=have dbms=tab replace;
guessingrows=32767;
run;
data have ;
set have;
ret=input(Returns,?? best32.);
drop Returns;
monyy=intnx('month',Names_Date,0);
format monyy yymmn6.;
run;
proc sort data=have;by PERMNO monyy Names_Date;run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(ret=_ret));
declare hash h(multidata:'y');
h.definekey('Names_Date');
h.definedata('_ret');
h.definedone();
end;
array x{100} _temporary_;
do until(last.PERMNO);
set have;
by PERMNO;
_ret=ret;h.add();
end;
do until(last.PERMNO);
set have;
by PERMNO;
n=0;call missing(of x{*});
do i=intnx('month',monyy,-3) to intnx('month',monyy,-1,'e');
rc=h.find(key:i);
do while(rc=0);
n+1;x{n}=_ret;
rc=h.find_next(key:i);
end;
end;
std=std(of x{*});
output;
end;
h.clear();
drop i _ret n rc;
run;
Can you post some sample data? Have you tried anything so far as well? PROC EXPAND is probably where I'd start, but I'd also consider changing your definition. If you use the previous 90 days for example or X trading days. 3 months is not a standard measure and you'll see blips because of February issue alone.
proc sql; create table want as select *,(select std(return) from have where id=a.id and date between intnx('month',a.date,-3) and a.date ) as std from have as a; quit;
Thank you Xia. But proc sql takes very long time if there are millions of observations. Is there any other way such as using array or using macro?
Yes, of course calculations of this nature are going to take a very long time if you have millions of observations. If speed doing the calculation on millions of observations is a concern, you should have stated this at the beginning.
Please read what Reeza said and show us some sample data, and show us the desired results.
Ksharp's (Xia) code is ok but I would like to know if it's possible to do using macro or using array. I attach the sample dataset. Test file is attached since sas data file cannot be attached.
Yes, of course it would be possible to write this as a macro. I don't know if it would be faster on millions of observations because I never tried. As @Reeza has stated, PROC EXPAND might also be a good solution.
Then, there is this solution, which works for regression coefficients with PROC REG, but something similar using PROC SUMMARY ought to work for standard deviations https://communities.sas.com/t5/SAS-Procedures/How-to-capture-the-regression-coeffecient-for-certain-...
(and this could be customized to handle the different number of days in each month)
If it was a fixed interval a different approach would be easier but since the number of changes each month depending on the month that's the biggest problem.
You could probably expand the temporary array approach used here, but PROC EXPAND is still your best option. Have you looked into that at all? It would likely also be the fastest option as SAS would have optimized it already.
http://support.sas.com/kb/41/380.html
You would need to set the array size to the largest possible number of records, probably 93 to be safe.
I think Hash Table could do that. Post some data ,maybe I could give a try.
Hi Xia,
Sample data is attached in text format. Please try with hash object.
OK. Try this one. Notice: each PERMNO and NAMES_DATE have only one Return.
proc import datafile='c:\temp\daily_ret.txt' out=have dbms=tab replace;
guessingrows=32767;
run;
data have ;
set have;
ret=input(Returns,?? best32.);
drop Returns;
monyy=intnx('month',Names_Date,0);
format monyy yymmn6.;
run;
proc sort data=have;by PERMNO monyy Names_Date;run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(ret=_ret));
declare hash h();
h.definekey('Names_Date');
h.definedata('_ret');
h.definedone();
end;
array x{100} _temporary_;
do until(last.PERMNO);
set have;
by PERMNO;
_ret=ret;h.add();
end;
do until(last.PERMNO);
set have;
by PERMNO;
n=0;call missing(of x{*});
do i=intnx('month',monyy,-3) to intnx('month',monyy,-1,'e');
if h.find(key:i)=0 then do;n+1;x{n}=_ret; end;
end;
std=std(of x{*});
output;
end;
h.clear();
drop i _ret n;
run;
It's lot faster now. It also shows the result i.e. the standard deviation. But it shows the following error many times:
ERROR: Duplicate key.
That means there are multiply return for the same PERMNO and NAMES_DATE .
Make sure a PERMNO and a NAMES_DATE have only one return.
To sum(return) in each PERMNO and NAMES_DATE combination.
After that running code would not generate that error information.
proc summary data=have;
by PERMNO NAMES_DATE ;
var return;
output out=want sum=;
run;
OR you want the last return for each group of PERMNO and NAMES_DATE.
data want;
set have;
by PERMNO NAMES_DATE ;
if last.NAMES_DATE ;
run;
OK. If you want retain those multiple returns in the same PERMNO and the same Names_Date.
proc import datafile='c:\temp\daily_ret.txt' out=have dbms=tab replace;
guessingrows=32767;
run;
data have ;
set have;
ret=input(Returns,?? best32.);
drop Returns;
monyy=intnx('month',Names_Date,0);
format monyy yymmn6.;
run;
proc sort data=have;by PERMNO monyy Names_Date;run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(ret=_ret));
declare hash h(multidata:'y');
h.definekey('Names_Date');
h.definedata('_ret');
h.definedone();
end;
array x{100} _temporary_;
do until(last.PERMNO);
set have;
by PERMNO;
_ret=ret;h.add();
end;
do until(last.PERMNO);
set have;
by PERMNO;
n=0;call missing(of x{*});
do i=intnx('month',monyy,-3) to intnx('month',monyy,-1,'e');
rc=h.find(key:i);
do while(rc=0);
n+1;x{n}=_ret;
rc=h.find_next(key:i);
end;
end;
std=std(of x{*});
output;
end;
h.clear();
drop i _ret n rc;
run;
Thanks a lot Xia.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.