Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- how to compute the average correlation across a very large correlation...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-27-2011 02:55 PM
(2639 views)

I’m trying to compute the average cross-sectional correlation across a large dataset. I may also want to compute a weighted average. Inaddition, my raw data is in stacked form. Assume, for example, I have a stacked list of investors and their portfolio weights (that sum to one for each investor) across all stocks (assume only four stocks).

Investor Stock Portfolio_weight

A IBM 0.50

A MSFT 0.40

A GOOG 0.10

A GRPN 0.00

B IBM 0.40

B MFST 0.60

B GOOG 0.00

B GRPM 0.00

C IBM 0.50

C MSFT 0

C GOGG 0

C GRPN 0.50

I wrote alittle test code (see below) to compute the average correlation between weights across investors. First, I compute a weights matrix where each row is a company and each column is an investor (data outb):

dataoutb

company

Obs ID _NAME_ A B C

1 GOOG wt 0.1 0.0 0.0

2 GRPN wt 0.0 0.0 0.5

3 IBM wt 0.5 0.4 0.5

4 MSFT wt 0.4 0.6 0.0

I then run a proc corr and keep the correlations (data d):

Obs _TYPE_ _NAME_ A B C

1 CORR A 1.00000 0.88684 0.00000

2 CORR B 0.88684 1.00000 -0.19245

3 CORR C 0.00000 -0.19245 1.00000

Because I only want the average of the off diagonal elements,I take the average times N, subtract 1 (the diagonal element) and divide by N-1 to get the average of the off diagonal elements: 0.23 = (0.886+0.00+-0.192)/3 (=ave_corr in data g)

I have four issues:

1. The number of investors in my simple example is three. In my real data, the number is approximately 3,000, which means the correlation matrix will have 9M elements. Is that an issue? (Moreover the number of stocks is approximately 5,000.)

2. My code seems a little clumsy. I welcome any suggestion for alternative approaches.

3. In addition to average correlation, I’d like to compute the sum of the products of the weights for each pair of investors and compute the average of that (e.g., for A and B, 0.1*0 + 0*0 + 0.5*0.4 + 0.4*0.6). Again, there would be approximately 9M of these terms. If I define outb as a 4X3 matrix (in my simple example), I can compute this as outb(transposed)*outb.

After reading about this a bit, this seems like something I should be able to do with proc IML. However, I’m not familiar with proc IML. Any suggestions on how to define data outb as a matrix so I can use proc IML? And if I can do that, how would I’d go about getting the average of the elements in the resulting matrix?

4. If I standardize the weights for each investor (i.e., subtract the mean and divided by the standard deviation), I could use the same method to compute the correlations (i.e., standardized weights matrix transposed *standardized weights matrix). So that might be another approach. Any thoughtson using IML versus my ‘traditional’ sas code?

Thanks inadvance.

Rick

My code

**data **a; input investor $ companyID $ wt;

cards;

A IBM 0.50

A MSFT 0.40

A GOOG 0.10

A GRPN 0.00

B IBM 0.40

B MSFT 0.60

B GOOG 0.00

B GRPN 0.00

C IBM 0.50

C MSFT 0.00

C GOOG 0.00

C GRPN 0.50

;

**run**;

**data **b; set a;

**proc sort**; by companyID;

**proc transpose** out=outb;by companyID;id investor; var wt; *outb gives the weightsmatrix, columns are investors and rows are companies;

title 'data outb';

**proc print** data=outb;**run**;

**data **c; set outb;

**proc corr** out=outc;*computes correlation matrix;

**data **d; set outc;

if _TYPE_='CORR';

**proc means** noprint;

output out=outd; *averagecorrelation for each investor;

**data **e; set outd;

if _STAT_='MEAN';

**proc transpose** out=oute;

**data **f; set oute;

if _NAME_='_TYPE_' then delete;

if _NAME_='_FREQ_' then delete;

ave_corr=col1;

**proc means ** noprint; var ave_corr;

output out=outf mean=ave_corr_cross_investors;*average correlation across investors;

**data **g; set outf;

ave_corr=(ave_corr_cross_investors*_FREQ_-**1**)/(_FREQ_-**1**);*average correlation dropping the diagonal;

**proc** **print**; **run**;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi. Rick. I am not familar with IML.

If you need IML code, you can post it at IML forum. There is a specialist about IML who also is a SAS employee -- Rich M.

But I think you also can do it under Data Step.

data a; input investor $ companyID $ wt; cards; A IBM 0.50 A MSFT 0.40 A GOOG 0.10 A GRPN 0.00 B IBM 0.40 B MSFT 0.60 B GOOG 0.00 B GRPN 0.00 C IBM 0.50 C MSFT 0.00 C GOOG 0.00 C GRPN 0.50 ; run; proc sort data=a; by companyID;run; proc transpose data=a out=temp(drop=_name_); by companyid; id investor; var wt; run; proc corr data=temp outp=corr(drop=_name_ where=(_type_='CORR')) noprint; var _numeric_; run; data _null_; set corr end=last; retain sum .; array _a{*} _numeric_; i=1; do while(_a{i} ne 1); sum=sum(sum,_a{i});i+1;n+1; end; if last then do; mean=sum/n; put 'ave_corr= ' mean; end; run; options nomprint nomlogic nosymbolgen; /*compute the sum of the products of the weights for each pair of investors */ %macro across; proc sql noprint; select distinct investor into : list separated by ' ' from a; create table want as select %do i=1 %to %sysfunc(countw(&list)); %do j=%eval(&i+1) %to %sysfunc(countw(&list)); sum(%scan(&list,&i)*%scan(&list,&j)) as %scan(&list,&i)%scan(&list,&j) %if &i ne %eval(%sysfunc(countw(&list))-1) or &j ne %sysfunc(countw(&list)) %then %do;,%end; %end; %end; from temp; quit; %mend across; %across

Ksharp

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi. Rick. I am not familar with IML.

If you need IML code, you can post it at IML forum. There is a specialist about IML who also is a SAS employee -- Rich M.

But I think you also can do it under Data Step.

data a; input investor $ companyID $ wt; cards; A IBM 0.50 A MSFT 0.40 A GOOG 0.10 A GRPN 0.00 B IBM 0.40 B MSFT 0.60 B GOOG 0.00 B GRPN 0.00 C IBM 0.50 C MSFT 0.00 C GOOG 0.00 C GRPN 0.50 ; run; proc sort data=a; by companyID;run; proc transpose data=a out=temp(drop=_name_); by companyid; id investor; var wt; run; proc corr data=temp outp=corr(drop=_name_ where=(_type_='CORR')) noprint; var _numeric_; run; data _null_; set corr end=last; retain sum .; array _a{*} _numeric_; i=1; do while(_a{i} ne 1); sum=sum(sum,_a{i});i+1;n+1; end; if last then do; mean=sum/n; put 'ave_corr= ' mean; end; run; options nomprint nomlogic nosymbolgen; /*compute the sum of the products of the weights for each pair of investors */ %macro across; proc sql noprint; select distinct investor into : list separated by ' ' from a; create table want as select %do i=1 %to %sysfunc(countw(&list)); %do j=%eval(&i+1) %to %sysfunc(countw(&list)); sum(%scan(&list,&i)*%scan(&list,&j)) as %scan(&list,&i)%scan(&list,&j) %if &i ne %eval(%sysfunc(countw(&list))-1) or &j ne %sysfunc(countw(&list)) %then %do;,%end; %end; %end; from temp; quit; %mend across; %across

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks once more Ksharp...you've saved me a lot of time...

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.