Obsidian | Level 7

## Pairwise comparisons in loops

Hi everyone,

first of all, thanks a lot for helping me out so far with my questions, awesome to have such a community!

(SAS University Edition)

Coming to my next problem:

My dataset looks like this:

Shareholder      Company       Quarter        SharesHold       SharesOutstanding        SectorClassifier

A                AA              31MAR2006       15000                1.000.000.000                      2

B                AA             31MAR2006       1000                  1.000.000.000                       2

C                AA             31MAR2006       35000                1.000.000.000                       2

C                BB             31MAR2006       1000                  250.000.000                          2

B                BB             31MAR2006       25000                250.000.000                          2

Z                BB             31MAR2006       12000                250.000.000                          2

Of course, there are more holders, more companies, more sectors and more quarters actually. What I need, is the average number of common holders (in this case there would be 2 common holders, "B" and "C"), the amount of shares they hold together (in this case 1000+35000 in AA and 1000+25000 in BB, thus 62000 overall), the amount of shares outstanding overall (in this case 1.000.000.000+250.000.000=1.250.000.000) and the percentage (62.000/1.250.000.000). Since there are more firms per sector and I need pairwise comparisons for every pair, I think that has to be done via a loop, starting with AA and BB then exchanging BB for CC and so on until every company has been compared to AA. Afterwards, the same loop would have to run again starting with BB and CC, then excahnging CC for DD and so on.. until every pair has been compared.

To make it more complicated, like said above, the comparison has to be done every quarter, so I need pairwise comparisons between two firms for every point in time. Furthermore, I want to compare only firms within a sector, so the whole thing should run individually for sector 1, then sector 2 and so on.

Optimally, the output would look like this:

Quarter         Sector   CompanyPair  NbrCommonHolders  ShrsCommHolders     ShrsOutst     PercCommHold

31MAR2006     2                1                           2                           62000                1250000000        0,005%

31MAR2006     2                2                           X                          XXXXX               YYYYYYYY         X,XX%

31MAR2006     2                3                           X                          XXXXX               YYYYYYYY         X,XX%

31MAR2006     2                4                           X                          XXXXX               YYYYYYYY         X,XX%

I've got no idea how to solve this and would be really thankful if anyone can help me out!!!

Best regards!

1 ACCEPTED SOLUTION

Accepted Solutions
Rhodochrosite | Level 12

## Re: Pairwise comparisons in loops

Generalizing the program to identify the shareholders by Sector/quarter proved a little bit more difficult for me than i supposed but i think the following program should work, though i haven't checked the resulting dataset thoroughly.

``````data have;
format Quarter date9.;
input Shareholder \$ Company \$ Quarter date9. Sector;
cards;
A AA 31MAR2006 2
B AA 31MAR2006 2
C AA 31MAR2006 2
C BB 31MAR2006 2
B BB 31MAR2006 2
Z BB 31MAR2006 2
A CC 31MAR2006 2
B CC 31MAR2006 2
C CC 31MAR2006 2
Z CC 31MAR2006 2
A DD 31MAR2006 3
B DD 31MAR2006 3
C DD 31MAR2006 3
C EE 31MAR2006 3
B EE 31MAR2006 3
Z EE 31MAR2006 3
A FF 31MAR2006 3
B FF 31MAR2006 3
C FF 31MAR2006 3
Z FF 31MAR2006 3
A AA 30JUN2006 2
B AA 30JUN2006 2
C AA 30JUN2006 2
C BB 30JUN2006 2
Y BB 30JUN2006 2
Z BB 30JUN2006 2
A CC 30JUN2006 2
B CC 30JUN2006 2
C CC 30JUN2006 2
Z CC 30JUN2006 2
A DD 30JUN2006 3
B DD 30JUN2006 3
C DD 30JUN2006 3
A EE 30JUN2006 3
B EE 30JUN2006 3
Y EE 30JUN2006 3
A FF 30JUN2006 3
Y FF 30JUN2006 3
C FF 30JUN2006 3
Z FF 30JUN2006 3
;
run;

proc sql noprint;
/* We generate all possible pairs as a Cartesian product */
CREATE TABLE pairs AS
SELECT DISTINCT a.Sector, a.Quarter, a.Company AS Comp1, b.Company AS Comp2
FROM have a, have b
WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
AND b.Quarter=a.Quarter AND b.Sector=a.Sector
;
quit;

/* Create a reference dataset of shareholders with numeric id */
/* that can be used later as an array index                   */
proc sort data=have(keep=Shareholder) out=Shareholders nodupkey;
by Shareholder;
run;

data Shareholders;
Id=_N_;
set Shareholders;
call symput("nbsh",strip(_N_));
run;

/* We append the shareholders ids to the have dataset */
proc sql noprint;
CREATE TABLE have AS
SELECT a.*, b.ID
FROM have a
LEFT JOIN Shareholders b
ON b.Shareholder=a.Shareholder;
quit;

proc sort data=have out=have_S;
by Sector Quarter Shareholder Company;
run;

data have_S;
set have_S;
by Sector Quarter;
fstq=first.Quarter;
lstq=last.Quarter;
run;

/* For each pairs, we identify Sharheloders that have shares in both companies */
data Common;
set pairs;

/* Array index is the Id of the shareholder */
array SH (&nbsh.) SH1-SH&nbsh.;

do i=1 to nobs;
/* We rename Sector and Quarter to avoid overriding columns in pairs dataset */
set have_S (rename=(Sector=Sect Quarter=Quart)) point=i nobs=nobs;

if fstq then call missing(of SH(*));

if Sector=Sect and Quarter=Quart and (Company=Comp1 or Company=Comp2) then SH(Id)+1;

if lstq then do idx=1 to dim(SH);
/* If the shareholder with Id=idx have shares in both companies from the current pair */
/* we save the Id and output the result.                                              */
if SH(idx)=2 then do;
ShareholderId=idx;
output;
end;
end;
end;

keep Quarter Sector Comp1 Comp2 ShareholderId;
run;

/* We retrieve the shareholer name */
proc sql noprint;
CREATE TABLE Common AS
SELECT a.*, B.Shareholder
FROM Common a
LEFT JOIN Shareholders b
ON b.Id=a.ShareholderId
ORDER BY Comp1, Comp2, Shareholder;
quit;
``````
9 REPLIES 9
Rhodochrosite | Level 12

## Re: Pairwise comparisons in loops

Hello,

``````data have;
input Shareholder \$ Company  \$ sector;
cards;
A AA 2
B AA 2
C AA 2
C BB 2
B BB 2
Z BB 2
;
run;

proc sort data=have;
by sector shareholder company;
run;

proc transpose data=have out=thave;
by sector shareholder;
var company;
run;

proc sort data=thave (keep=Shareholder COL:) nouniquekey out=common;
by col:;
run;

proc sql noprint;
SELECT count(*) INTO :ncols TRIMMED
FROM dictionary.columns
WHERE LIBNAME="WORK" AND MEMNAME="COMMON" AND upcase(NAME) LIKE "COL%";
quit;

data common;
set common;
by COL&ncols.;

if first.COL&ncols. then group+1;

keep Shareholder group;
run;
``````
Obsidian | Level 7

## Re: Pairwise comparisons in loops

Hi, thanks for the first reply!

As far as I understand your code, it does not take the quarterly aspect into account, right? The shareholders of every company might be different between every point in time, so that the number of common holders is not the same for every quarter.

Just to add: I am really new to SAS, so a very quick explanation of the individual steps would be awesome!

Kind regards

Rhodochrosite | Level 12

## Re: Pairwise comparisons in loops

As @mkeintz said, taking quarterly aspect into account shouldn't be much of a problem once

the main logic is in place. Reading his answer makes me think i completely missed the point.

Could you explain your problem in more details (maybe with a more complete example with

more shareholder and companies) ?

PROC Star

## Re: Pairwise comparisons in loops

Doing this within sector/quarter is not the challenge.  My question is to nail down the term "average number of common holders".

I presume you mean, for say 20 companies in a given qtr/sector, you want 20 averages.  I.e. for company 1, you want to add up the number of common shareholders with company 2, 3, ... 20, and divide by 19, is that correct?  And you also want to add up the number of shares of company 1  (and each of the other 19?) held by those shareholders.  Is that also correct?  Those two new variables would be added to the company 1 record.  Then the same for each other company.  Do I have it right?

A research question:  why would you add shares of the other companies to the total common shares held by common shareholders.  Those other companies have different share value from the company-in-hand, and different share values from each other.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: Pairwise comparisons in loops

I'm interested in the aggregated figures for the whole sector for every quarter at the end, so I don't actually need any company-specific average of every pair with other companies. To aggregate the average number of common holders per qtr/sector, it would be enough to have the individual figures for every pair. This way I could say that the average pair of firms within a given sector in quarter Z has X common holders, which hold Y percent overall and on average in this pair of companies.

That means that your were right in the point that it would not be economically correct to add up the shares of both companies. I think the right methodology here is to identify common holders for a pair of firms and calculate an average of their holdings. So in a first step, I would have to calculate the percentage of shares hold for every investor/company/quarter in order to calculate the sum (e.g. 5% in AA, 10% in BB divided by 2 = 7,5% on average in this pair for this investor for this quarter). Let me try to give you a more complex example.

Shareholder      Company       Quarter        SharesHold       SharesOutstanding        SectorClassifier

A                AA              31MAR2006      15000                1.000.000.000                        2

B                AA             31MAR2006       1000                  1.000.000.000                        2

C                AA             31MAR2006       35000                1.000.000.000                        2

C                BB             31MAR2006       1000                     250.000.000                        2

B                BB             31MAR2006       25000                   250.000.000                        2

Z                BB             31MAR2006       12000                   250.000.000                        2

A                CC             31MAR2006       25000                  500.000.000                         2

B                CC             31MAR2006       35000                  500.000.000                         2

C               CC             31MAR2006        45000                  500.000.000                         2

Z               CC             31MAR2006        5000                    500.000.000                          2

E               CC             31MAR2006        15000                   500.000.000                         2

B               DD             31MAR2006        25000                   300.000.000                         2

F               DD             31MAR2006        35000                   300.000.000                         2

E               EE             31MAR2006        300000               1.000.000.000                        2

F               EE             31MAR2006        350000               1.000.000.000                        2

A               EE             31MAR2006        25000                 1.000.000.000                        2

Step 1) Calculate the percentage of shares owned by every investor for every company. (e.g. A holds 15000/1.000.000.000=0,0015% in AA)

Step 2) Start the pairwise comparison of company AA and BB for the first quarter. They have 2 common shareholders, "B" and "C". This is the first value I need.

Step 3) The second value refers to the overlap of the common holdings in order to show the economical impact. E.g. B holds 0,0001% in AA and 0,01% in BB, that means he holds (0,0001%+0,01%)/2= 0,00505% on average in both firms. The second common holder, C, has an average of (0,035%+0,0004%)/2=0,00195% in both companies. As a result, the aggregated amount of ownership stakes held by common owners, for this pair would be 0,00505%+0,00195%= 0,007%. It is, so to say, the sum of all the common holder's average stakes in both companies.

Another way to aggregate this on the firm-pair level would be to use a MIN function instead of a sum function. That means, you calculate the stakes of each common shareholder for the companies, and sum up the smaller value for each shareholder instead of the average. In this case, this would mean taking the sum of min(0,0001%;0,01%)=0,0001% for A and min(0,035%;0,0004%)=0,0004% for B, so 0,0005%. These two measures are proposed by Gilje, Gormley and Levit in "The rise of common ownership", if anyone is interested.

Of course it would be optimal if I had both possilities at the end to compare the results.

After having compared firms AA and BB, the next comparison would be AA and CC, starting with Step 2 again. After comparing AA and CC, the next pair would be AA and DD.. until every company has been compared to AA. Then starting with BB and CC. Like I said, I do not need any average for company AA at the end. I just need the values of Step 2 and Step 3 for every pair. This way, I can, for a given quarter calculate the next step.

Let's say we have 20 companies in the sector, this would mean I would get n*(n-1)/2=20*19/2=190 distinct pair values. Probably it would be helpful to create a new sheet which gives me the pair values in a format like this:

Quarter         Sector   CompanyPair  NbrCommonHolders  Sum_Avg_Holdings  Sum_MIN_Holdings

31MAR2006     2                1                           2                           0,007%                      0,0004%

31MAR2006     2                2                           X                          X,XX%                        X,XX%

31MAR2006     2                3                           X                          X,XX%                        X,XX%

31MAR2006     2                4                           X                          X,XX%                        X,XX%

Step 4): Aggregate the results of every firm-pair for a given quarter. This is the main task, so aggregating the variables NbrCommonHolders, Sum_Avg_Holdings and Sum_MIN_Holdings for every quarter and every sector. A poosible result would be:

"From MAR2003 to MAR2010, the number of common holders for an average pair of firms in the Airline sector has increased from 1,7 to 4,2. Their economical impact of these holdings has also risen. In MAR2010, common holders held stakes equal to 15% for an average pair of firms. In MAR2003, this figure was significantly lower with 4,3%."

Of course, Step 4 is by far the easiest task. In order to measure some differences, like I said before, it would be awesome to have the pairwise results in a new sheet.

Thanks a lot!

Rhodochrosite | Level 12

## Re: Pairwise comparisons in loops

You have a complex problem and we won't solve the whole thing for you.

You made a first step by identifying the different steps involved.

I propose to limit this thread to the identification of common shareholders.

Here is a program that adress this specific subject :

``````data have;
input Shareholder \$ Company \$;
cards;
A AA
B AA
C AA
C BB
B BB
Z BB
A CC
B CC
C CC
Z CC
E CC
B DD
F DD
E EE
F EE
A EE
;
run;

proc sql noprint;
/* We generate all possible pairs as a Cartesian product */
CREATE TABLE pairs AS
SELECT DISTINCT a.Company AS Comp1, b.Company AS Comp2
FROM have a, have b
WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
;
quit;

/* Create a reference dataset of shareholders with numeric id */
/* that can be used later as an array index                   */
proc sort data=have(keep=Shareholder) out=Shareholders nodupkey;
by Shareholder;
run;

data Shareholders;
Id=_N_;
set Shareholders;
call symput("nbsh",strip(_N_));
run;

/* We append the shareholders ids to the have dataset */
proc sql noprint;
CREATE TABLE have AS
SELECT a.*, b.ID
FROM have a
LEFT JOIN Shareholders b
ON b.Shareholder=a.Shareholder;
quit;

/* For each pairs, we identify Sharheloders that have shares in both companies */
data Common;
set pairs;

/* Array index is the Id of the shareholder */
array SH (&nbsh.) SH1-SH&nbsh.;

call missing(of SH(*));

do i=1 to nobs;
set have point=i nobs=nobs;

SH(Id)+sum(Company=Comp1,Company=Comp2);
end;

do i=1 to dim(SH);
if SH(i)=2 then do;
ShareholderId=i;
output;
end;
end;

keep Comp1 Comp2 ShareholderId;
run;

/* We retrieve the shareholer name */
proc sql noprint;
CREATE TABLE Common AS
SELECT a.*, B.Shareholder
FROM Common a
LEFT JOIN Shareholders b
ON b.Id=a.ShareholderId
ORDER BY Comp1, Comp2, Shareholder;
quit;``````

From that you will be able to move to the other aspects of your task and you

can open new discussions i you get stuck.

Obsidian | Level 7

## Re: Pairwise comparisons in loops

Yeah, probably a good idea to focus on one aspect. Actually, the output of your example code looks really good, if I can transfer this to my dataset this would be nice. There are two (hopefully small) problems, which I (as a beginner) am not able to solve.

The pair-matching is only expected to be done for every sector, so maybe one should include a loop right here:

``````proc sql noprint;
/* We generate all possible pairs as a Cartesian product */
CREATE TABLE pairs AS
SELECT DISTINCT a.Company AS Comp1, b.Company AS Comp2
FROM have a, have b
WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
;
quit;``````

which says "start with industry 1, do the Cartesian product and after finishing do it for sector 2, until sector 48" because I have 48 sectors.

And: I still do not get the point where this is done with regard to the quarters. From my point of view, there might be two possibilities:

1) Do the pairwise comparison at the level of every pair directly for every quarter

2) Do the whole thing for quarter 1, then for quarter 2 etc. again with a loop.

I think possibility 1 would be much better in terms of computing power. In both cases, the "Quarter" variable has to be included in the output data, because I have to be able to run a time-series analysis atferwards. Maybe we can just expand the existing example data set by a second quarter and sector in order to illustrate this?

Shareholder      Company       Quarter               SectorClassifier

A                AA             31MAR2006                      2

B                AA             31MAR2006                      2

C                AA             31MAR2006                      2

C                BB             31MAR2006                      2

B                BB             31MAR2006                      2

Z                BB             31MAR2006                      2

A                CC             31MAR2006                     2

B                CC             31MAR2006                     2

C               CC              31MAR2006                     2

Z               CC              31MAR2006                     2

A                DD             31MAR2006                     3

B                DD             31MAR2006                     3

C                DD             31MAR2006                     3

C                EE             31MAR2006                     3

B                EE             31MAR2006                     3

Z                EE             31MAR2006                     3

A                FF             31MAR2006                     3

B                FF             31MAR2006                     3

C               FF             31MAR2006                      3

Z               FF              31MAR200                       3

A                AA             30JUN2006                      2

B                AA             30JUN2006                      2

C                AA             30JUN2006                      2

C                BB             30JUN2006                      2

Y                BB             30JUN2006                      2

Z                BB             30JUN2006                      2

A                CC             30JUN2006                     2

B                CC            30JUN2006                      2

C               CC             30JUN2006                      2

Z                CC             30JUN2006                      2

A                DD             30JUN2006                     3

B                DD             30JUN2006                     3

C                DD             30JUN2006                     3

A                EE             30JUN2006                      3

B                EE             30JUN2006                      3

Y                EE             30JUN2006                      3

A                FF             30JUN2006                      3

Y                FF            30JUN2006                       3

C               FF             30JUN2006                       3

Z               FF             30JUN2006                        3

Of course, thanks a lot so far for every contribution !!

Rhodochrosite | Level 12

## Re: Pairwise comparisons in loops

Generalizing the program to identify the shareholders by Sector/quarter proved a little bit more difficult for me than i supposed but i think the following program should work, though i haven't checked the resulting dataset thoroughly.

``````data have;
format Quarter date9.;
input Shareholder \$ Company \$ Quarter date9. Sector;
cards;
A AA 31MAR2006 2
B AA 31MAR2006 2
C AA 31MAR2006 2
C BB 31MAR2006 2
B BB 31MAR2006 2
Z BB 31MAR2006 2
A CC 31MAR2006 2
B CC 31MAR2006 2
C CC 31MAR2006 2
Z CC 31MAR2006 2
A DD 31MAR2006 3
B DD 31MAR2006 3
C DD 31MAR2006 3
C EE 31MAR2006 3
B EE 31MAR2006 3
Z EE 31MAR2006 3
A FF 31MAR2006 3
B FF 31MAR2006 3
C FF 31MAR2006 3
Z FF 31MAR2006 3
A AA 30JUN2006 2
B AA 30JUN2006 2
C AA 30JUN2006 2
C BB 30JUN2006 2
Y BB 30JUN2006 2
Z BB 30JUN2006 2
A CC 30JUN2006 2
B CC 30JUN2006 2
C CC 30JUN2006 2
Z CC 30JUN2006 2
A DD 30JUN2006 3
B DD 30JUN2006 3
C DD 30JUN2006 3
A EE 30JUN2006 3
B EE 30JUN2006 3
Y EE 30JUN2006 3
A FF 30JUN2006 3
Y FF 30JUN2006 3
C FF 30JUN2006 3
Z FF 30JUN2006 3
;
run;

proc sql noprint;
/* We generate all possible pairs as a Cartesian product */
CREATE TABLE pairs AS
SELECT DISTINCT a.Sector, a.Quarter, a.Company AS Comp1, b.Company AS Comp2
FROM have a, have b
WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
AND b.Quarter=a.Quarter AND b.Sector=a.Sector
;
quit;

/* Create a reference dataset of shareholders with numeric id */
/* that can be used later as an array index                   */
proc sort data=have(keep=Shareholder) out=Shareholders nodupkey;
by Shareholder;
run;

data Shareholders;
Id=_N_;
set Shareholders;
call symput("nbsh",strip(_N_));
run;

/* We append the shareholders ids to the have dataset */
proc sql noprint;
CREATE TABLE have AS
SELECT a.*, b.ID
FROM have a
LEFT JOIN Shareholders b
ON b.Shareholder=a.Shareholder;
quit;

proc sort data=have out=have_S;
by Sector Quarter Shareholder Company;
run;

data have_S;
set have_S;
by Sector Quarter;
fstq=first.Quarter;
lstq=last.Quarter;
run;

/* For each pairs, we identify Sharheloders that have shares in both companies */
data Common;
set pairs;

/* Array index is the Id of the shareholder */
array SH (&nbsh.) SH1-SH&nbsh.;

do i=1 to nobs;
/* We rename Sector and Quarter to avoid overriding columns in pairs dataset */
set have_S (rename=(Sector=Sect Quarter=Quart)) point=i nobs=nobs;

if fstq then call missing(of SH(*));

if Sector=Sect and Quarter=Quart and (Company=Comp1 or Company=Comp2) then SH(Id)+1;

if lstq then do idx=1 to dim(SH);
/* If the shareholder with Id=idx have shares in both companies from the current pair */
/* we save the Id and output the result.                                              */
if SH(idx)=2 then do;
ShareholderId=idx;
output;
end;
end;
end;

keep Quarter Sector Comp1 Comp2 ShareholderId;
run;

/* We retrieve the shareholer name */
proc sql noprint;
CREATE TABLE Common AS
SELECT a.*, B.Shareholder
FROM Common a
LEFT JOIN Shareholders b
ON b.Id=a.ShareholderId
ORDER BY Comp1, Comp2, Shareholder;
quit;
``````
Obsidian | Level 7

## Re: Pairwise comparisons in loops

You are a true hero!

I was able to transfer this to my dataset, works! Probably I'm returning with the next question in a new topic in the course of the analysis, but this step was really crucial, thank you so much!

Discussion stats
• 9 replies
• 1964 views
• 2 likes
• 3 in conversation