BookmarkSubscribeRSS Feed
somebody
Lapis Lazuli | Level 10

I have a large dataset that contains daily returns for a many stocks. The dataset has 3 columns: date, stocks, and returns.
I would like to calculate the rolling correlation and covariance between stocks for the last 1 week, 4 weeks (month), and 50 weeks (year).

FYI, I am implementing a Corsi HAR model as I try to predict correlation next period using the correlation between stocks during the last 1 week, 4 weeks and 50 weeks. What would be the best approach?

My attempt so far:

*creating pairs of stocks and their returns;
        data stocks; 
            set stocks;
            year = floor(date/10000);
            month= floor(date/100) - year*100;
            day = date - year*10000 - month *100;
            format date2 yymmddn8.; 
            date2= MDY(month,day,year); *create a date variable in datetime format so we can use it in proc expand;
        run; 

        *creating a list of pairs;
            proc sql;
                create table pairs as
                select 
                    a.date,
                    a.cusip as cusip1,
                    a.r as r1,
                    b.cusip as cusip2,
                    b.r as r2
                from 
                    stocks as a inner join 
                    stocks as b on (a.date=b.date);
            quit;
            *remove duplicated pairs, we only want one pair between two stocks e.g. AB only, not BA;
            data pairs;
                set pairs;
                length new1 new2 $ 40;
                new1=cusip1;
                new2=cusip2;
                call sortc(new1,new2); 
                if cusip1 ^= cusip2;
            run;
            proc sort data=pairs out=pairs(drop=new1 new2) nodupkey;
                by date new1 new2;
            run;
            *now use PROC EXPAND to calculate moving average means. But what we really want is moving average correlations. Also, for the first few observations of each pairs, we dont have enough obverservations for yearly correlation so we should remove them. That is only take obs that have at least 1 year of previous returns.;
            proc expand data=process.pairs out=out method=none;
               id date;
               convert y = MA   / transout=(movave 5);
            run;
            *calculating correlation;
            proc sort data=stocks;by date cusip1 cusip2;run;
            data stocks; set stocks;
                OBS = 1;
                if missing(r1)=. OR missing(r2)=. then do;
                    r1 = . ;
                    r2 = . ;
                    obs = . ;
                end;
                PRODr1r2 = r1 * r2;
            run;

            proc expand DATA = stocks OUT = stocksout;
                id date;
                by cusip1 cusip2;
                convert r2 = r2SUM / METHOD = none TRANSFORMOUT = (cmovsum 55);
                convert r1 = r1SUM / METHOD = none TRANSFORMOUT = (cmovsum 55);
                convert prodr1r2 = PRODr1r2SUM / METHOD = none TRANSFORMOUT = (cmovsum 55);
                convert obs = N / METHOD = none TRANSFORMOUT = (cmovsum 55);
                convert r1 = r1CSS / METHOD = none TRANSFORMOUT = (cmovcss 55);
                convert r2 = r2CSS / METHOD = none TRANSFORMOUT = (cmovcss 55);
            run;
data results; set stockout; R = (prodr1r2sum - (r2sum*r1sum)/n) / ( sqrt(r1css)*sqrt(r2css)) ; run;

The option cmovsum computes centered moving sum, and hence centered moving correlations. but what I what is the correlation of previous 7 days, or 250 days. I tried cusum but it gives strange numbers

 

Sample data can be downloaded here : https://www.dropbox.com/s/wfngi8ayjj0drh2/stocks.csv?dl=0


Thank you very much for your help!

2 REPLIES 2
Reeza
Super User

You change the 55 to get the different periods you're interested in...so for 7 days change it to CMOVSUM 7

somebody
Lapis Lazuli | Level 10

Thanks. I have worked it out. CMOVSUM computes centered moving average which is not I want. I use MOVSUM instead as it use previous data instead of half previous and half looking forward as in CMOVSUM. Just leaving this comment for future readers 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1377 views
  • 2 likes
  • 2 in conversation