Solved: Pairwise comparisons in loops

svw1900 · Posted 09-18-2018 06:45 AM

Hi everyone,

first of all, thanks a lot for helping me out so far with my questions, awesome to have such a community!

(SAS University Edition)

Coming to my next problem:

My dataset looks like this:

Shareholder Company Quarter SharesHold SharesOutstanding SectorClassifier

A AA 31MAR2006 15000 1.000.000.000 2

B AA 31MAR2006 1000 1.000.000.000 2

C AA 31MAR2006 35000 1.000.000.000 2

C BB 31MAR2006 1000 250.000.000 2

B BB 31MAR2006 25000 250.000.000 2

Z BB 31MAR2006 12000 250.000.000 2

Of course, there are more holders, more companies, more sectors and more quarters actually. What I need, is the average number of common holders (in this case there would be 2 common holders, "B" and "C"), the amount of shares they hold together (in this case 1000+35000 in AA and 1000+25000 in BB, thus 62000 overall), the amount of shares outstanding overall (in this case 1.000.000.000+250.000.000=1.250.000.000) and the percentage (62.000/1.250.000.000). Since there are more firms per sector and I need pairwise comparisons for every pair, I think that has to be done via a loop, starting with AA and BB then exchanging BB for CC and so on until every company has been compared to AA. Afterwards, the same loop would have to run again starting with BB and CC, then excahnging CC for DD and so on.. until every pair has been compared.

To make it more complicated, like said above, the comparison has to be done every quarter, so I need pairwise comparisons between two firms for every point in time. Furthermore, I want to compare only firms within a sector, so the whole thing should run individually for sector 1, then sector 2 and so on.

Optimally, the output would look like this:

Quarter Sector CompanyPair NbrCommonHolders ShrsCommHolders ShrsOutst PercCommHold

31MAR2006 2 1 2 62000 1250000000 0,005%

31MAR2006 2 2 X XXXXX YYYYYYYY X,XX%

31MAR2006 2 3 X XXXXX YYYYYYYY X,XX%

31MAR2006 2 4 X XXXXX YYYYYYYY X,XX%

I've got no idea how to solve this and would be really thankful if anyone can help me out!!!

Best regards!

gamotte · Posted 09-21-2018 09:31 AM

Generalizing the program to identify the shareholders by Sector/quarter proved a little bit more difficult for me than i supposed but i think the following program should work, though i haven't checked the resulting dataset thoroughly.

data have;
format Quarter date9.;
input Shareholder $ Company $ Quarter date9. Sector;
cards; 
A AA 31MAR2006 2
B AA 31MAR2006 2
C AA 31MAR2006 2
C BB 31MAR2006 2
B BB 31MAR2006 2
Z BB 31MAR2006 2
A CC 31MAR2006 2
B CC 31MAR2006 2
C CC 31MAR2006 2
Z CC 31MAR2006 2
A DD 31MAR2006 3
B DD 31MAR2006 3
C DD 31MAR2006 3
C EE 31MAR2006 3
B EE 31MAR2006 3
Z EE 31MAR2006 3
A FF 31MAR2006 3
B FF 31MAR2006 3
C FF 31MAR2006 3
Z FF 31MAR2006 3
A AA 30JUN2006 2
B AA 30JUN2006 2
C AA 30JUN2006 2
C BB 30JUN2006 2
Y BB 30JUN2006 2
Z BB 30JUN2006 2
A CC 30JUN2006 2
B CC 30JUN2006 2
C CC 30JUN2006 2
Z CC 30JUN2006 2
A DD 30JUN2006 3
B DD 30JUN2006 3
C DD 30JUN2006 3
A EE 30JUN2006 3
B EE 30JUN2006 3
Y EE 30JUN2006 3
A FF 30JUN2006 3
Y FF 30JUN2006 3
C FF 30JUN2006 3
Z FF 30JUN2006 3
;
run;

proc sql noprint;
    /* We generate all possible pairs as a Cartesian product */
    CREATE TABLE pairs AS
    SELECT DISTINCT a.Sector, a.Quarter, a.Company AS Comp1, b.Company AS Comp2
    FROM have a, have b
    WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
    AND b.Quarter=a.Quarter AND b.Sector=a.Sector
    ;
quit;

/* Create a reference dataset of shareholders with numeric id */
/* that can be used later as an array index                   */
proc sort data=have(keep=Shareholder) out=Shareholders nodupkey;
    by Shareholder;
run;

data Shareholders;
    Id=_N_;
    set Shareholders;
    call symput("nbsh",strip(_N_));
run;

/* We append the shareholders ids to the have dataset */
proc sql noprint;
    CREATE TABLE have AS
    SELECT a.*, b.ID
    FROM have a
    LEFT JOIN Shareholders b
    ON b.Shareholder=a.Shareholder;
quit;


proc sort data=have out=have_S;
    by Sector Quarter Shareholder Company;
run;

data have_S;
    set have_S;
    by Sector Quarter;
    fstq=first.Quarter;
    lstq=last.Quarter;
run;

/* For each pairs, we identify Sharheloders that have shares in both companies */
data Common;
    set pairs;

    /* Array index is the Id of the shareholder */
    array SH (&nbsh.) SH1-SH&nbsh.;

    do i=1 to nobs;
        /* We rename Sector and Quarter to avoid overriding columns in pairs dataset */
        set have_S (rename=(Sector=Sect Quarter=Quart)) point=i nobs=nobs;

        if fstq then call missing(of SH(*));

        if Sector=Sect and Quarter=Quart and (Company=Comp1 or Company=Comp2) then SH(Id)+1;

        if lstq then do idx=1 to dim(SH);
            /* If the shareholder with Id=idx have shares in both companies from the current pair */
            /* we save the Id and output the result.                                              */
            if SH(idx)=2 then do;
                ShareholderId=idx;
                output;
            end;
        end;
    end;

    keep Quarter Sector Comp1 Comp2 ShareholderId;
run;

/* We retrieve the shareholer name */
proc sql noprint;
    CREATE TABLE Common AS
    SELECT a.*, B.Shareholder
    FROM Common a
    LEFT JOIN Shareholders b
    ON b.Id=a.ShareholderId
    ORDER BY Comp1, Comp2, Shareholder;
quit;

View solution in original post

gamotte · Posted 09-18-2018 09:49 AM

Hello,

Here is a start to help you identifying common holders.

data have;
    input Shareholder $ Company  $ sector;
    cards;
A AA 2
B AA 2
C AA 2
C BB 2
B BB 2
Z BB 2
;
run;

proc sort data=have;
    by sector shareholder company;
run;

proc transpose data=have out=thave;
    by sector shareholder;
    var company;
run;

proc sort data=thave (keep=Shareholder COL:) nouniquekey out=common;
    by col:;
run;

proc sql noprint;
    SELECT count(*) INTO :ncols TRIMMED
    FROM dictionary.columns
    WHERE LIBNAME="WORK" AND MEMNAME="COMMON" AND upcase(NAME) LIKE "COL%";
quit;

data common;
    set common;
    by COL&ncols.;

    if first.COL&ncols. then group+1;

    keep Shareholder group;
run;

svw1900 · Posted 09-18-2018 06:21 PM

Hi, thanks for the first reply!

As far as I understand your code, it does not take the quarterly aspect into account, right? The shareholders of every company might be different between every point in time, so that the number of common holders is not the same for every quarter.

Just to add: I am really new to SAS, so a very quick explanation of the individual steps would be awesome!

Kind regards

gamotte · Posted 09-19-2018 03:55 AM

As @mkeintz said, taking quarterly aspect into account shouldn't be much of a problem once

the main logic is in place. Reading his answer makes me think i completely missed the point.

Could you explain your problem in more details (maybe with a more complete example with

more shareholder and companies) ?

mkeintz · Posted 09-18-2018 08:58 PM

Doing this within sector/quarter is not the challenge. My question is to nail down the term "average number of common holders".

I presume you mean, for say 20 companies in a given qtr/sector, you want 20 averages. I.e. for company 1, you want to add up the number of common shareholders with company 2, 3, ... 20, and divide by 19, is that correct? And you also want to add up the number of shares of company 1 (and each of the other 19?) held by those shareholders. Is that also correct? Those two new variables would be added to the company 1 record. Then the same for each other company. Do I have it right?

A research question: why would you add shares of the other companies to the total common shares held by common shareholders. Those other companies have different share value from the company-in-hand, and different share values from each other.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

svw1900 · Posted 09-19-2018 11:36 AM

I'm interested in the aggregated figures for the whole sector for every quarter at the end, so I don't actually need any company-specific average of every pair with other companies. To aggregate the average number of common holders per qtr/sector, it would be enough to have the individual figures for every pair. This way I could say that the average pair of firms within a given sector in quarter Z has X common holders, which hold Y percent overall and on average in this pair of companies.

That means that your were right in the point that it would not be economically correct to add up the shares of both companies. I think the right methodology here is to identify common holders for a pair of firms and calculate an average of their holdings. So in a first step, I would have to calculate the percentage of shares hold for every investor/company/quarter in order to calculate the sum (e.g. 5% in AA, 10% in BB divided by 2 = 7,5% on average in this pair for this investor for this quarter). Let me try to give you a more complex example.

Shareholder Company Quarter SharesHold SharesOutstanding SectorClassifier

A AA 31MAR2006 15000 1.000.000.000 2

B AA 31MAR2006 1000 1.000.000.000 2

C AA 31MAR2006 35000 1.000.000.000 2

C BB 31MAR2006 1000 250.000.000 2

B BB 31MAR2006 25000 250.000.000 2

Z BB 31MAR2006 12000 250.000.000 2

A CC 31MAR2006 25000 500.000.000 2

B CC 31MAR2006 35000 500.000.000 2

C CC 31MAR2006 45000 500.000.000 2

Z CC 31MAR2006 5000 500.000.000 2

E CC 31MAR2006 15000 500.000.000 2

B DD 31MAR2006 25000 300.000.000 2

F DD 31MAR2006 35000 300.000.000 2

E EE 31MAR2006 300000 1.000.000.000 2

F EE 31MAR2006 350000 1.000.000.000 2

A EE 31MAR2006 25000 1.000.000.000 2

Step 1) Calculate the percentage of shares owned by every investor for every company. (e.g. A holds 15000/1.000.000.000=0,0015% in AA)

Step 2) Start the pairwise comparison of company AA and BB for the first quarter. They have 2 common shareholders, "B" and "C". This is the first value I need.

Step 3) The second value refers to the overlap of the common holdings in order to show the economical impact. E.g. B holds 0,0001% in AA and 0,01% in BB, that means he holds (0,0001%+0,01%)/2= 0,00505% on average in both firms. The second common holder, C, has an average of (0,035%+0,0004%)/2=0,00195% in both companies. As a result, the aggregated amount of ownership stakes held by common owners, for this pair would be 0,00505%+0,00195%= 0,007%. It is, so to say, the sum of all the common holder's average stakes in both companies.

Another way to aggregate this on the firm-pair level would be to use a MIN function instead of a sum function. That means, you calculate the stakes of each common shareholder for the companies, and sum up the smaller value for each shareholder instead of the average. In this case, this would mean taking the sum of min(0,0001%;0,01%)=0,0001% for A and min(0,035%;0,0004%)=0,0004% for B, so 0,0005%. These two measures are proposed by Gilje, Gormley and Levit in "The rise of common ownership", if anyone is interested.

Of course it would be optimal if I had both possilities at the end to compare the results.

After having compared firms AA and BB, the next comparison would be AA and CC, starting with Step 2 again. After comparing AA and CC, the next pair would be AA and DD.. until every company has been compared to AA. Then starting with BB and CC. Like I said, I do not need any average for company AA at the end. I just need the values of Step 2 and Step 3 for every pair. This way, I can, for a given quarter calculate the next step.

Let's say we have 20 companies in the sector, this would mean I would get n*(n-1)/2=20*19/2=190 distinct pair values. Probably it would be helpful to create a new sheet which gives me the pair values in a format like this:

Quarter Sector CompanyPair NbrCommonHolders Sum_Avg_Holdings Sum_MIN_Holdings

31MAR2006 2 1 2 0,007% 0,0004%

31MAR2006 2 2 X X,XX% X,XX%

31MAR2006 2 3 X X,XX% X,XX%

31MAR2006 2 4 X X,XX% X,XX%

Step 4): Aggregate the results of every firm-pair for a given quarter. This is the main task, so aggregating the variables NbrCommonHolders, Sum_Avg_Holdings and Sum_MIN_Holdings for every quarter and every sector. A poosible result would be:

"From MAR2003 to MAR2010, the number of common holders for an average pair of firms in the Airline sector has increased from 1,7 to 4,2. Their economical impact of these holdings has also risen. In MAR2010, common holders held stakes equal to 15% for an average pair of firms. In MAR2003, this figure was significantly lower with 4,3%."

Of course, Step 4 is by far the easiest task. In order to measure some differences, like I said before, it would be awesome to have the pairwise results in a new sheet.

Thanks a lot!

gamotte · Posted 09-20-2018 07:47 AM

You have a complex problem and we won't solve the whole thing for you.

You made a first step by identifying the different steps involved.

I propose to limit this thread to the identification of common shareholders.

Here is a program that adress this specific subject :

data have;
    input Shareholder $ Company $;
    cards;
A AA
B AA
C AA
C BB
B BB
Z BB
A CC
B CC
C CC 
Z CC 
E CC 
B DD 
F DD 
E EE 
F EE 
A EE 
;
run;


proc sql noprint;
    /* We generate all possible pairs as a Cartesian product */
    CREATE TABLE pairs AS
    SELECT DISTINCT a.Company AS Comp1, b.Company AS Comp2
    FROM have a, have b
    WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
    ;
quit;

/* Create a reference dataset of shareholders with numeric id */
/* that can be used later as an array index                   */
proc sort data=have(keep=Shareholder) out=Shareholders nodupkey;
    by Shareholder;
run;

data Shareholders;
    Id=_N_;
    set Shareholders;
    call symput("nbsh",strip(_N_));
run;

/* We append the shareholders ids to the have dataset */
proc sql noprint;
    CREATE TABLE have AS
    SELECT a.*, b.ID
    FROM have a
    LEFT JOIN Shareholders b
    ON b.Shareholder=a.Shareholder;
quit;

/* For each pairs, we identify Sharheloders that have shares in both companies */
data Common;
    set pairs;

    /* Array index is the Id of the shareholder */
    array SH (&nbsh.) SH1-SH&nbsh.;

    call missing(of SH(*));

    do i=1 to nobs;
        set have point=i nobs=nobs;

        SH(Id)+sum(Company=Comp1,Company=Comp2);
    end;

    do i=1 to dim(SH);
        if SH(i)=2 then do;
            ShareholderId=i;
            output;
        end;
    end;

    keep Comp1 Comp2 ShareholderId;
run;

/* We retrieve the shareholer name */
proc sql noprint;
    CREATE TABLE Common AS
    SELECT a.*, B.Shareholder
    FROM Common a
    LEFT JOIN Shareholders b
    ON b.Id=a.ShareholderId
    ORDER BY Comp1, Comp2, Shareholder;
quit;

From that you will be able to move to the other aspects of your task and you

can open new discussions i you get stuck.

svw1900 · Posted 09-20-2018 01:10 PM

Yeah, probably a good idea to focus on one aspect. Actually, the output of your example code looks really good, if I can transfer this to my dataset this would be nice. There are two (hopefully small) problems, which I (as a beginner) am not able to solve.

The pair-matching is only expected to be done for every sector, so maybe one should include a loop right here:

proc sql noprint;
    /* We generate all possible pairs as a Cartesian product */
    CREATE TABLE pairs AS
    SELECT DISTINCT a.Company AS Comp1, b.Company AS Comp2
    FROM have a, have b
    WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
    ;
quit;

which says "start with industry 1, do the Cartesian product and after finishing do it for sector 2, until sector 48" because I have 48 sectors.

And: I still do not get the point where this is done with regard to the quarters. From my point of view, there might be two possibilities:

1) Do the pairwise comparison at the level of every pair directly for every quarter

2) Do the whole thing for quarter 1, then for quarter 2 etc. again with a loop.

I think possibility 1 would be much better in terms of computing power. In both cases, the "Quarter" variable has to be included in the output data, because I have to be able to run a time-series analysis atferwards. Maybe we can just expand the existing example data set by a second quarter and sector in order to illustrate this?

Shareholder Company Quarter SectorClassifier

A AA 31MAR2006 2

B AA 31MAR2006 2

C AA 31MAR2006 2

C BB 31MAR2006 2

B BB 31MAR2006 2

Z BB 31MAR2006 2

A CC 31MAR2006 2

B CC 31MAR2006 2

C CC 31MAR2006 2

Z CC 31MAR2006 2

A DD 31MAR2006 3

B DD 31MAR2006 3

C DD 31MAR2006 3

C EE 31MAR2006 3

B EE 31MAR2006 3

Z EE 31MAR2006 3

A FF 31MAR2006 3

B FF 31MAR2006 3

C FF 31MAR2006 3

Z FF 31MAR200 3

A AA 30JUN2006 2

B AA 30JUN2006 2

C AA 30JUN2006 2

C BB 30JUN2006 2

Y BB 30JUN2006 2

Z BB 30JUN2006 2

A CC 30JUN2006 2

B CC 30JUN2006 2

C CC 30JUN2006 2

Z CC 30JUN2006 2

A DD 30JUN2006 3

B DD 30JUN2006 3

C DD 30JUN2006 3

A EE 30JUN2006 3

B EE 30JUN2006 3

Y EE 30JUN2006 3

A FF 30JUN2006 3

Y FF 30JUN2006 3

C FF 30JUN2006 3

Z FF 30JUN2006 3

Of course, thanks a lot so far for every contribution !!

gamotte · Posted 09-21-2018 09:31 AM

Generalizing the program to identify the shareholders by Sector/quarter proved a little bit more difficult for me than i supposed but i think the following program should work, though i haven't checked the resulting dataset thoroughly.

data have;
format Quarter date9.;
input Shareholder $ Company $ Quarter date9. Sector;
cards; 
A AA 31MAR2006 2
B AA 31MAR2006 2
C AA 31MAR2006 2
C BB 31MAR2006 2
B BB 31MAR2006 2
Z BB 31MAR2006 2
A CC 31MAR2006 2
B CC 31MAR2006 2
C CC 31MAR2006 2
Z CC 31MAR2006 2
A DD 31MAR2006 3
B DD 31MAR2006 3
C DD 31MAR2006 3
C EE 31MAR2006 3
B EE 31MAR2006 3
Z EE 31MAR2006 3
A FF 31MAR2006 3
B FF 31MAR2006 3
C FF 31MAR2006 3
Z FF 31MAR2006 3
A AA 30JUN2006 2
B AA 30JUN2006 2
C AA 30JUN2006 2
C BB 30JUN2006 2
Y BB 30JUN2006 2
Z BB 30JUN2006 2
A CC 30JUN2006 2
B CC 30JUN2006 2
C CC 30JUN2006 2
Z CC 30JUN2006 2
A DD 30JUN2006 3
B DD 30JUN2006 3
C DD 30JUN2006 3
A EE 30JUN2006 3
B EE 30JUN2006 3
Y EE 30JUN2006 3
A FF 30JUN2006 3
Y FF 30JUN2006 3
C FF 30JUN2006 3
Z FF 30JUN2006 3
;
run;

proc sql noprint;
    /* We generate all possible pairs as a Cartesian product */
    CREATE TABLE pairs AS
    SELECT DISTINCT a.Sector, a.Quarter, a.Company AS Comp1, b.Company AS Comp2
    FROM have a, have b
    WHERE a.Company<b.Company /* Avoid to include both (AA,BB) and (BB,AA) */
    AND b.Quarter=a.Quarter AND b.Sector=a.Sector
    ;
quit;

/* Create a reference dataset of shareholders with numeric id */
/* that can be used later as an array index                   */
proc sort data=have(keep=Shareholder) out=Shareholders nodupkey;
    by Shareholder;
run;

data Shareholders;
    Id=_N_;
    set Shareholders;
    call symput("nbsh",strip(_N_));
run;

/* We append the shareholders ids to the have dataset */
proc sql noprint;
    CREATE TABLE have AS
    SELECT a.*, b.ID
    FROM have a
    LEFT JOIN Shareholders b
    ON b.Shareholder=a.Shareholder;
quit;


proc sort data=have out=have_S;
    by Sector Quarter Shareholder Company;
run;

data have_S;
    set have_S;
    by Sector Quarter;
    fstq=first.Quarter;
    lstq=last.Quarter;
run;

/* For each pairs, we identify Sharheloders that have shares in both companies */
data Common;
    set pairs;

    /* Array index is the Id of the shareholder */
    array SH (&nbsh.) SH1-SH&nbsh.;

    do i=1 to nobs;
        /* We rename Sector and Quarter to avoid overriding columns in pairs dataset */
        set have_S (rename=(Sector=Sect Quarter=Quart)) point=i nobs=nobs;

        if fstq then call missing(of SH(*));

        if Sector=Sect and Quarter=Quart and (Company=Comp1 or Company=Comp2) then SH(Id)+1;

        if lstq then do idx=1 to dim(SH);
            /* If the shareholder with Id=idx have shares in both companies from the current pair */
            /* we save the Id and output the result.                                              */
            if SH(idx)=2 then do;
                ShareholderId=idx;
                output;
            end;
        end;
    end;

    keep Quarter Sector Comp1 Comp2 ShareholderId;
run;

/* We retrieve the shareholer name */
proc sql noprint;
    CREATE TABLE Common AS
    SELECT a.*, B.Shareholder
    FROM Common a
    LEFT JOIN Shareholders b
    ON b.Id=a.ShareholderId
    ORDER BY Comp1, Comp2, Shareholder;
quit;

svw1900 · Posted 09-21-2018 09:49 PM

You are a true hero!

I was able to transfer this to my dataset, works! Probably I'm returning with the next question in a new topic in the course of the analysis, but this step was really crucial, thank you so much!

Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Re: Pairwise comparisons in loops

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away