01-14-2018 09:50 PM
I have a large dataset that contains daily returns for a many stocks. The dataset has 3 columns: date, stocks, and returns.
I would like to calculate the rolling correlation and covariance between stocks for the last 1 week, 4 weeks (month), and 50 weeks (year).
FYI, I am implementing a Corsi HAR model as I try to predict correlation next period using the correlation between stocks during the last 1 week, 4 weeks and 50 weeks. What would be the best approach?
My attempt so far:
*creating pairs of stocks and their returns; data stocks; set stocks; year = floor(date/10000); month= floor(date/100) - year*100; day = date - year*10000 - month *100; format date2 yymmddn8.; date2= MDY(month,day,year); *create a date variable in datetime format so we can use it in proc expand; run; *creating a list of pairs; proc sql; create table pairs as select a.date, a.cusip as cusip1, a.r as r1, b.cusip as cusip2, b.r as r2 from stocks as a inner join stocks as b on (a.date=b.date); quit; *remove duplicated pairs, we only want one pair between two stocks e.g. AB only, not BA; data pairs; set pairs; length new1 new2 $ 40; new1=cusip1; new2=cusip2; call sortc(new1,new2); if cusip1 ^= cusip2; run; proc sort data=pairs out=pairs(drop=new1 new2) nodupkey; by date new1 new2; run; *now use PROC EXPAND to calculate moving average means. But what we really want is moving average correlations. Also, for the first few observations of each pairs, we dont have enough obverservations for yearly correlation so we should remove them. That is only take obs that have at least 1 year of previous returns.; proc expand data=process.pairs out=out method=none; id date; convert y = MA / transout=(movave 5); run; *calculating correlation; proc sort data=stocks;by date cusip1 cusip2;run; data stocks; set stocks; OBS = 1; if missing(r1)=. OR missing(r2)=. then do; r1 = . ; r2 = . ; obs = . ; end; PRODr1r2 = r1 * r2; run; proc expand DATA = stocks OUT = stocksout; id date; by cusip1 cusip2; convert r2 = r2SUM / METHOD = none TRANSFORMOUT = (cmovsum 55); convert r1 = r1SUM / METHOD = none TRANSFORMOUT = (cmovsum 55); convert prodr1r2 = PRODr1r2SUM / METHOD = none TRANSFORMOUT = (cmovsum 55); convert obs = N / METHOD = none TRANSFORMOUT = (cmovsum 55); convert r1 = r1CSS / METHOD = none TRANSFORMOUT = (cmovcss 55); convert r2 = r2CSS / METHOD = none TRANSFORMOUT = (cmovcss 55); run;
data results; set stockout; R = (prodr1r2sum - (r2sum*r1sum)/n) / ( sqrt(r1css)*sqrt(r2css)) ; run;
The option cmovsum computes centered moving sum, and hence centered moving correlations. but what I what is the correlation of previous 7 days, or 250 days. I tried cusum but it gives strange numbers
Sample data can be downloaded here : https://www.dropbox.com/s/wfngi8ayjj0drh2/stocks.csv?dl=0
Thank you very much for your help!
01-16-2018 05:48 PM
Thanks. I have worked it out. CMOVSUM computes centered moving average which is not I want. I use MOVSUM instead as it use previous data instead of half previous and half looking forward as in CMOVSUM. Just leaving this comment for future readers