BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
coug914
Calcite | Level 5

I’m trying to compute the average cross-sectional correlation across a large dataset. I may also want to compute a weighted average. Inaddition, my raw data is in stacked form. Assume, for example, I have a stacked list of investors and their portfolio weights (that sum to one for each investor) across all stocks (assume only four stocks).

Investor               Stock     Portfolio_weight             

A                             IBM        0.50                                       

A                             MSFT     0.40

A                             GOOG   0.10

A                             GRPN    0.00

B                             IBM        0.40

B                             MFST     0.60

B                             GOOG   0.00

B                             GRPM   0.00

C                             IBM        0.50

C                             MSFT     0

C                             GOGG   0

C                             GRPN    0.50

I wrote alittle test code (see below) to compute the average correlation between weights across investors. First, I compute a weights matrix where each row is a company and each column is an investor (data outb):

dataoutb                                                                                                                                                                                                                 

       company

Obs      ID      _NAME_     A      B     C

1     GOOG        wt      0.1   0.0    0.0

2     GRPN        wt      0.0   0.0    0.5

3     IBM         wt      0.5   0.4    0.5

4      MSFT        wt     0.4    0.6    0.0

I then run a proc corr and keep the correlations (data d):

Obs    _TYPE_   _NAME_       A           B           C

1     CORR       A       1.00000     0.88684    0.00000

2     CORR       B       0.88684     1.00000   -0.19245

3      CORR      C       0.00000    -0.19245    1.00000

Because I only want the average of the off diagonal elements,I take the average times N, subtract 1 (the diagonal element) and divide by N-1 to get the average of the off diagonal elements: 0.23 = (0.886+0.00+-0.192)/3 (=ave_corr in data g)

I have four issues:

1.  The number of investors in my simple example is three. In my real data, the number is approximately 3,000, which means the correlation matrix will have 9M elements. Is that an issue? (Moreover the number of stocks is approximately 5,000.)

2.  My code seems a little clumsy. I welcome any suggestion for alternative approaches.

3.  In addition to average correlation, I’d like to compute the sum of the products of the weights for each pair of investors and compute the average of that (e.g., for A and B, 0.1*0 + 0*0 + 0.5*0.4 + 0.4*0.6). Again, there would be approximately 9M of these terms. If I define outb as a 4X3 matrix (in my simple example), I can compute this as outb(transposed)*outb.

After reading about this a bit, this seems like something I should be able to do with proc IML. However, I’m not familiar with proc IML. Any suggestions on how to define data outb as a matrix so I can use proc IML? And if I can do that, how would I’d go about getting the average of the elements in the resulting matrix?

4.  If I standardize the weights for each investor (i.e., subtract the mean and divided by the standard deviation), I could use the same method to compute the correlations (i.e., standardized weights matrix transposed *standardized weights matrix). So that might be another approach. Any thoughtson using IML versus my ‘traditional’ sas code?

Thanks inadvance.

Rick

My code

data a; input investor $ companyID $ wt;

cards;

A IBM 0.50

A MSFT 0.40

A GOOG 0.10

A GRPN 0.00

B IBM 0.40

B MSFT 0.60

B GOOG 0.00

B GRPN 0.00

C IBM 0.50

C MSFT 0.00

C GOOG 0.00

C GRPN 0.50

;

run;

data b; set a;

proc sort; by companyID;

proc transpose out=outb;by companyID;id investor; var wt;   *outb gives the weightsmatrix, columns are investors and rows are companies;

title 'data outb';                                              

proc print data=outb;run;

data c; set outb;

proc corr out=outc;*computes correlation matrix;

data d; set outc;    

if _TYPE_='CORR';

proc means noprint;

output out=outd;   *averagecorrelation for each investor;

data e; set outd;

if _STAT_='MEAN';

proc transpose out=oute;

data f; set oute;

if _NAME_='_TYPE_' then delete;

if _NAME_='_FREQ_' then delete;

ave_corr=col1;

proc means noprint; var ave_corr;

output out=outf mean=ave_corr_cross_investors;*average correlation across investors;

data g; set outf;

ave_corr=(ave_corr_cross_investors*_FREQ_-1)/(_FREQ_-1);*average correlation dropping the diagonal;

proc print; run;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Hi. Rick. I am not familar with IML.

If you need IML code, you can post it at IML forum. There is a specialist about IML who also is a SAS employee -- Rich M.

But I think you also can do it under Data Step.

data a; input investor $ companyID $ wt;
cards;
A IBM 0.50
A MSFT 0.40
A GOOG 0.10
A GRPN 0.00
B IBM 0.40
B MSFT 0.60
B GOOG 0.00
B GRPN 0.00
C IBM 0.50
C MSFT 0.00
C GOOG 0.00
C GRPN 0.50
;
run;


proc sort data=a; by companyID;run;
proc transpose data=a out=temp(drop=_name_);
 by companyid;
 id investor;
 var wt;
run;
proc corr data=temp outp=corr(drop=_name_ where=(_type_='CORR')) noprint;
 var _numeric_;
run;
data _null_;
 set corr end=last;
 retain sum .;
 array _a{*} _numeric_;
 i=1;
 do while(_a{i} ne 1);
  sum=sum(sum,_a{i});i+1;n+1;
 end;
 if last then do;
               mean=sum/n; 
               put 'ave_corr= ' mean;
              end;
run;

options nomprint nomlogic nosymbolgen;
/*compute the sum of the products of the weights for 
each pair of investors */
%macro across;
proc sql noprint;
 select distinct investor into : list separated by ' ' from a;

create table want as
 select 
        %do i=1 %to %sysfunc(countw(&list));
          %do j=%eval(&i+1) %to %sysfunc(countw(&list));
           sum(%scan(&list,&i)*%scan(&list,&j)) as %scan(&list,&i)%scan(&list,&j)
           %if &i ne %eval(%sysfunc(countw(&list))-1) or  &j ne %sysfunc(countw(&list)) %then %do;,%end;
          %end;
        %end;
  from temp;
quit;
%mend across;

%across




Ksharp

View solution in original post

2 REPLIES 2
Ksharp
Super User

Hi. Rick. I am not familar with IML.

If you need IML code, you can post it at IML forum. There is a specialist about IML who also is a SAS employee -- Rich M.

But I think you also can do it under Data Step.

data a; input investor $ companyID $ wt;
cards;
A IBM 0.50
A MSFT 0.40
A GOOG 0.10
A GRPN 0.00
B IBM 0.40
B MSFT 0.60
B GOOG 0.00
B GRPN 0.00
C IBM 0.50
C MSFT 0.00
C GOOG 0.00
C GRPN 0.50
;
run;


proc sort data=a; by companyID;run;
proc transpose data=a out=temp(drop=_name_);
 by companyid;
 id investor;
 var wt;
run;
proc corr data=temp outp=corr(drop=_name_ where=(_type_='CORR')) noprint;
 var _numeric_;
run;
data _null_;
 set corr end=last;
 retain sum .;
 array _a{*} _numeric_;
 i=1;
 do while(_a{i} ne 1);
  sum=sum(sum,_a{i});i+1;n+1;
 end;
 if last then do;
               mean=sum/n; 
               put 'ave_corr= ' mean;
              end;
run;

options nomprint nomlogic nosymbolgen;
/*compute the sum of the products of the weights for 
each pair of investors */
%macro across;
proc sql noprint;
 select distinct investor into : list separated by ' ' from a;

create table want as
 select 
        %do i=1 %to %sysfunc(countw(&list));
          %do j=%eval(&i+1) %to %sysfunc(countw(&list));
           sum(%scan(&list,&i)*%scan(&list,&j)) as %scan(&list,&i)%scan(&list,&j)
           %if &i ne %eval(%sysfunc(countw(&list))-1) or  &j ne %sysfunc(countw(&list)) %then %do;,%end;
          %end;
        %end;
  from temp;
quit;
%mend across;

%across




Ksharp

coug914
Calcite | Level 5

Thanks once more Ksharp...you've saved me a lot of time...

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2666 views
  • 0 likes
  • 2 in conversation