turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Programming
- /
- Calculating correlation/covariance for stocks

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-14-2018 09:50 PM

I have a large dataset that contains daily returns for a many stocks. The dataset has 3 columns: date, stocks, and returns.

I would like to calculate the rolling correlation and covariance between stocks for the last 1 week, 4 weeks (month), and 50 weeks (year).

FYI, I am implementing a Corsi HAR model as I try to predict correlation next period using the correlation between stocks during the last 1 week, 4 weeks and 50 weeks. What would be the best approach?

My attempt so far:

```
*creating pairs of stocks and their returns;
data stocks;
set stocks;
year = floor(date/10000);
month= floor(date/100) - year*100;
day = date - year*10000 - month *100;
format date2 yymmddn8.;
date2= MDY(month,day,year); *create a date variable in datetime format so we can use it in proc expand;
run;
*creating a list of pairs;
proc sql;
create table pairs as
select
a.date,
a.cusip as cusip1,
a.r as r1,
b.cusip as cusip2,
b.r as r2
from
stocks as a inner join
stocks as b on (a.date=b.date);
quit;
*remove duplicated pairs, we only want one pair between two stocks e.g. AB only, not BA;
data pairs;
set pairs;
length new1 new2 $ 40;
new1=cusip1;
new2=cusip2;
call sortc(new1,new2);
if cusip1 ^= cusip2;
run;
proc sort data=pairs out=pairs(drop=new1 new2) nodupkey;
by date new1 new2;
run;
*now use PROC EXPAND to calculate moving average means. But what we really want is moving average correlations. Also, for the first few observations of each pairs, we dont have enough obverservations for yearly correlation so we should remove them. That is only take obs that have at least 1 year of previous returns.;
proc expand data=process.pairs out=out method=none;
id date;
convert y = MA / transout=(movave 5);
run;
*calculating correlation;
proc sort data=stocks;by date cusip1 cusip2;run;
data stocks; set stocks;
OBS = 1;
if missing(r1)=. OR missing(r2)=. then do;
r1 = . ;
r2 = . ;
obs = . ;
end;
PRODr1r2 = r1 * r2;
run;
proc expand DATA = stocks OUT = stocksout;
id date;
by cusip1 cusip2;
convert r2 = r2SUM / METHOD = none TRANSFORMOUT = (cmovsum 55);
convert r1 = r1SUM / METHOD = none TRANSFORMOUT = (cmovsum 55);
convert prodr1r2 = PRODr1r2SUM / METHOD = none TRANSFORMOUT = (cmovsum 55);
convert obs = N / METHOD = none TRANSFORMOUT = (cmovsum 55);
convert r1 = r1CSS / METHOD = none TRANSFORMOUT = (cmovcss 55);
convert r2 = r2CSS / METHOD = none TRANSFORMOUT = (cmovcss 55);
run;
```

data results; set stockout; R = (prodr1r2sum - (r2sum*r1sum)/n) / ( sqrt(r1css)*sqrt(r2css)) ; run;

The option cmovsum computes centered moving sum, and hence centered moving correlations. but what I what is the correlation of previous 7 days, or 250 days. I tried cusum but it gives strange numbers

Sample data can be downloaded here : https://www.dropbox.com/s/wfngi8ayjj0drh2/stocks.csv?dl=0

Thank you very much for your help!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ducman1611

01-15-2018 11:06 AM

You change the 55 to get the different periods you're interested in...so for 7 days change it to CMOVSUM 7

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

01-16-2018 05:48 PM

Thanks. I have worked it out. CMOVSUM computes centered moving average which is not I want. I use MOVSUM instead as it use previous data instead of half previous and half looking forward as in CMOVSUM. Just leaving this comment for future readers