Solved: how to compute the average correlation across a very large correlation...

coug914 · Posted 12-27-2011 02:55 PM

I’m trying to compute the average cross-sectional correlation across a large dataset. I may also want to compute a weighted average. Inaddition, my raw data is in stacked form. Assume, for example, I have a stacked list of investors and their portfolio weights (that sum to one for each investor) across all stocks (assume only four stocks).

Investor Stock Portfolio_weight

A IBM 0.50

A MSFT 0.40

A GOOG 0.10

A GRPN 0.00

B IBM 0.40

B MFST 0.60

B GOOG 0.00

B GRPM 0.00

C IBM 0.50

C MSFT 0

C GOGG 0

C GRPN 0.50

I wrote alittle test code (see below) to compute the average correlation between weights across investors. First, I compute a weights matrix where each row is a company and each column is an investor (data outb):

dataoutb

company

Obs ID _NAME_ A B C

1 GOOG wt 0.1 0.0 0.0

2 GRPN wt 0.0 0.0 0.5

3 IBM wt 0.5 0.4 0.5

4 MSFT wt 0.4 0.6 0.0

I then run a proc corr and keep the correlations (data d):

Obs _TYPE_ _NAME_ A B C

1 CORR A 1.00000 0.88684 0.00000

2 CORR B 0.88684 1.00000 -0.19245

3 CORR C 0.00000 -0.19245 1.00000

Because I only want the average of the off diagonal elements,I take the average times N, subtract 1 (the diagonal element) and divide by N-1 to get the average of the off diagonal elements: 0.23 = (0.886+0.00+-0.192)/3 (=ave_corr in data g)

I have four issues:

1. The number of investors in my simple example is three. In my real data, the number is approximately 3,000, which means the correlation matrix will have 9M elements. Is that an issue? (Moreover the number of stocks is approximately 5,000.)

2. My code seems a little clumsy. I welcome any suggestion for alternative approaches.

3. In addition to average correlation, I’d like to compute the sum of the products of the weights for each pair of investors and compute the average of that (e.g., for A and B, 0.1*0 + 0*0 + 0.5*0.4 + 0.4*0.6). Again, there would be approximately 9M of these terms. If I define outb as a 4X3 matrix (in my simple example), I can compute this as outb(transposed)*outb.

After reading about this a bit, this seems like something I should be able to do with proc IML. However, I’m not familiar with proc IML. Any suggestions on how to define data outb as a matrix so I can use proc IML? And if I can do that, how would I’d go about getting the average of the elements in the resulting matrix?

4. If I standardize the weights for each investor (i.e., subtract the mean and divided by the standard deviation), I could use the same method to compute the correlations (i.e., standardized weights matrix transposed *standardized weights matrix). So that might be another approach. Any thoughtson using IML versus my ‘traditional’ sas code?

Thanks inadvance.

Rick

My code

data a; input investor $ companyID $ wt;

cards;

A IBM 0.50

A MSFT 0.40

A GOOG 0.10

A GRPN 0.00

B IBM 0.40

B MSFT 0.60

B GOOG 0.00

B GRPN 0.00

C IBM 0.50

C MSFT 0.00

C GOOG 0.00

C GRPN 0.50

;

run;

data b; set a;

proc sort; by companyID;

proc transpose out=outb;by companyID;id investor; var wt; *outb gives the weightsmatrix, columns are investors and rows are companies;

title 'data outb';

proc print data=outb;run;

data c; set outb;

proc corr out=outc;*computes correlation matrix;

data d; set outc;

if _TYPE_='CORR';

proc means noprint;

output out=outd; *averagecorrelation for each investor;

data e; set outd;

if _STAT_='MEAN';

proc transpose out=oute;

data f; set oute;

if _NAME_='_TYPE_' then delete;

if _NAME_='_FREQ_' then delete;

ave_corr=col1;

proc means noprint; var ave_corr;

output out=outf mean=ave_corr_cross_investors;*average correlation across investors;

data g; set outf;

ave_corr=(ave_corr_cross_investors*_FREQ_-1)/(_FREQ_-1);*average correlation dropping the diagonal;

proc print; run;

Ksharp · Posted 12-28-2011 12:44 AM

Hi. Rick. I am not familar with IML.

If you need IML code, you can post it at IML forum. There is a specialist about IML who also is a SAS employee -- Rich M.

But I think you also can do it under Data Step.

data a; input investor $ companyID $ wt;
cards;
A IBM 0.50
A MSFT 0.40
A GOOG 0.10
A GRPN 0.00
B IBM 0.40
B MSFT 0.60
B GOOG 0.00
B GRPN 0.00
C IBM 0.50
C MSFT 0.00
C GOOG 0.00
C GRPN 0.50
;
run;


proc sort data=a; by companyID;run;
proc transpose data=a out=temp(drop=_name_);
 by companyid;
 id investor;
 var wt;
run;
proc corr data=temp outp=corr(drop=_name_ where=(_type_='CORR')) noprint;
 var _numeric_;
run;
data _null_;
 set corr end=last;
 retain sum .;
 array _a{*} _numeric_;
 i=1;
 do while(_a{i} ne 1);
  sum=sum(sum,_a{i});i+1;n+1;
 end;
 if last then do;
               mean=sum/n; 
               put 'ave_corr= ' mean;
              end;
run;

options nomprint nomlogic nosymbolgen;
/*compute the sum of the products of the weights for 
each pair of investors */
%macro across;
proc sql noprint;
 select distinct investor into : list separated by ' ' from a;

create table want as
 select 
        %do i=1 %to %sysfunc(countw(&list));
          %do j=%eval(&i+1) %to %sysfunc(countw(&list));
           sum(%scan(&list,&i)*%scan(&list,&j)) as %scan(&list,&i)%scan(&list,&j)
           %if &i ne %eval(%sysfunc(countw(&list))-1) or  &j ne %sysfunc(countw(&list)) %then %do;,%end;
          %end;
        %end;
  from temp;
quit;
%mend across;

%across

Ksharp

View solution in original post

Ksharp · Posted 12-28-2011 12:44 AM

Hi. Rick. I am not familar with IML.

If you need IML code, you can post it at IML forum. There is a specialist about IML who also is a SAS employee -- Rich M.

But I think you also can do it under Data Step.

data a; input investor $ companyID $ wt;
cards;
A IBM 0.50
A MSFT 0.40
A GOOG 0.10
A GRPN 0.00
B IBM 0.40
B MSFT 0.60
B GOOG 0.00
B GRPN 0.00
C IBM 0.50
C MSFT 0.00
C GOOG 0.00
C GRPN 0.50
;
run;


proc sort data=a; by companyID;run;
proc transpose data=a out=temp(drop=_name_);
 by companyid;
 id investor;
 var wt;
run;
proc corr data=temp outp=corr(drop=_name_ where=(_type_='CORR')) noprint;
 var _numeric_;
run;
data _null_;
 set corr end=last;
 retain sum .;
 array _a{*} _numeric_;
 i=1;
 do while(_a{i} ne 1);
  sum=sum(sum,_a{i});i+1;n+1;
 end;
 if last then do;
               mean=sum/n; 
               put 'ave_corr= ' mean;
              end;
run;

options nomprint nomlogic nosymbolgen;
/*compute the sum of the products of the weights for 
each pair of investors */
%macro across;
proc sql noprint;
 select distinct investor into : list separated by ' ' from a;

create table want as
 select 
        %do i=1 %to %sysfunc(countw(&list));
          %do j=%eval(&i+1) %to %sysfunc(countw(&list));
           sum(%scan(&list,&i)*%scan(&list,&j)) as %scan(&list,&i)%scan(&list,&j)
           %if &i ne %eval(%sysfunc(countw(&list))-1) or  &j ne %sysfunc(countw(&list)) %then %do;,%end;
          %end;
        %end;
  from temp;
quit;
%mend across;

%across

Ksharp

coug914 · Posted 12-29-2011 11:12 AM

Thanks once more Ksharp...you've saved me a lot of time...

how to compute the average correlation across a very large correlation matrix

how to compute the average correlation across a very large correlation matrix

how to compute the average correlation across a very large correlation matrix

how to compute the average correlation across a very large correlation matrix

Catch up on SAS Innovate 2026

how to compute the average correlation across a very large correlation matrix

how to compute the average correlation across a very large correlation matrix

how to compute the average correlation across a very large correlation matrix

how to compute the average correlation across a very large correlation matrix

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away