the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno'...

Alexxxxxxx · Posted 03-01-2022 11:13 PM

Hello all,

I am trying to calculate the percentage of each firm (Firm_ID)’s outstanding shares (share_outstanding) held by funds (‘fundno’ for fund id; and ‘shares’ is the number of firm’s shares held by a fund in a year) in each year (year).

My problem is, although in most cases, each group (i.e., each ‘Firm_ID’ in each ‘year’) only has one ‘share_outstanding’ value, in some cases a group includes more than one ‘share_outstanding’ value.

By following Table1 as an example,

data table1;
infile cards dsd dlm=",";
input
firm_id :$15.
year :8.
fundno :$15.
shares :8.
share_outstanding :8.
;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000

36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000

36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 27000000

36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
;;;;
run;

There are four groups (same Firm_iD, and same year) in Table 1. By using the first group as an example,

There are three ‘32000000’ and one ‘27000000’ in this group (i.e., Firm_ID=36110 and year=2002), and then

I assume the outstanding share of Firm_ID ‘36110’ in ‘year’ 2002 is ‘32000000’.
I sum the number of shares held by funds in this group, which is 150040 (=18000+92000+4740+35300)
I divide the number of shares held by funds in this group (150040) by the firm’s outstanding shares (32000000) to get a new variable ‘fund_ratio‘, which equals 0.00468875.

In the end, I expect to get

Firm_ID	year	fund_ratio
36110	2002	0.00468875

Following the same rule, in the second group (i.e., Firm_ID=36111 and year=2002), I have three ‘32000000’ and two ‘30000000’. So, the fund_ratio=0.0221796875, which is (26700+543350+60000+20000+59700)/ 32000000. I expect to get

Firm_ID	year	fund_ratio
36111	2002	0.0221796875

In the third (Firm_ID=36110,year=2003), there are two ‘32000000’ and two ‘37000000’. In this case, I keep them both. I expect to get

Firm_ID	year	fund_ratio
36110	2003	0.00468875
36110	2003	0.00405514

As

0.00468875= (18000+92000+4740+35300)/ 32000000

0.00405514= (18000+92000+4740+35300)/ 37000000

fourth group (Firm_ID=36111, year=2003), I expect to get

Firm_ID	year	fund_ratio
36111	2003	0.020314063
36111	2003	0.020969355
36111	2003	0.019698485
36111	2003	0.021668333

As

0.020314063 = (26700+543350+60000+ 20000)/32000000

0.020969355= (26700+543350+60000+ 20000)/ 31000000

0.019698485= (26700+543350+60000+ 20000)/ 33000000

0.021668333= (26700+543350+60000+ 20000)/ 30000000

I hope it is clear.

Many thanks in advance for any advice.

ballardw · Posted 03-01-2022 11:40 PM

You need to provide a Rule, or rules, for determining what the value of the denominator is for each group. Examples don't quite cover things when we get some other group that has 5 different "shares outstanding" values with 5 records for a given year.

The information you have provided would allow exactly 4 groups to be calculated for two companies and two years. Do you have more years? More companies? If so, we do not have sufficient information to handle those other cases.

@Alexxxxxxx wrote:

Hello all,

I am trying to calculate the percentage of each firm (Firm_ID)’s outstanding shares (share_outstanding) held by funds (‘fundno’ for fund id; and ‘shares’ is the number of firm’s shares held by a fund in a year) in each year (year).

My problem is, although in most cases, each group (i.e., each ‘Firm_ID’ in each ‘year’) only has one ‘share_outstanding’ value, in some cases a group includes more than one ‘share_outstanding’ value.

By following Table1 as an example,
data table1;
infile cards dsd dlm=",";
input
firm_id :$15.
year :8.
fundno :$15.
shares :8.
share_outstanding :8.
;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000

36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000

36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 27000000

36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
;;;;
run;
There are four groups (same Firm_iD, and same year) in Table 1. By using the first group as an example,

There are three ‘32000000’ and one ‘27000000’ in this group (i.e., Firm_ID=36110 and year=2002), and then

I assume the outstanding share of Firm_ID ‘36110’ in ‘year’ 2002 is ‘32000000’.

I sum the number of shares held by funds in this group, which is 150040 (=18000+92000+4740+35300)

I divide the number of shares held by funds in this group (150040) by the firm’s outstanding shares (32000000) to get a new variable ‘fund_ratio‘, which equals 0.00468875.

In the end, I expect to get

Firm_ID

year

fund_ratio

36110

2002

0.00468875

Following the same rule, in the second group (i.e., Firm_ID=36111 and year=2002), I have three ‘32000000’ and two ‘30000000’. So, the fund_ratio=0.0221796875, which is (26700+543350+60000+20000+59700)/ 32000000. I expect to get

Firm_ID

year

fund_ratio

36111

2002

0.0221796875

In the third (Firm_ID=36110,year=2003), there are two ‘32000000’ and two ‘37000000’. In this case, I keep them both. I expect to get

Firm_ID

year

fund_ratio

36110

2003

0.00468875

36110

2003

0.00405514

As

0.00468875= (18000+92000+4740+35300)/ 32000000

0.00405514= (18000+92000+4740+35300)/ 37000000

fourth group (Firm_ID=36111, year=2003), I expect to get

Firm_ID

year

fund_ratio

36111

2003

0.020314063

36111

2003

0.020969355

36111

2003

0.019698485

36111

2003

0.021668333

As

0.020314063 = (26700+543350+60000+ 20000)/32000000

0.020969355= (26700+543350+60000+ 20000)/ 31000000

0.019698485= (26700+543350+60000+ 20000)/ 33000000

0.021668333= (26700+543350+60000+ 20000)/ 30000000

I hope it is clear.

Many thanks in advance for any advice.

Alexxxxxxx · Posted 03-02-2022 12:53 AM

Hi @ballardw , thanks for your reply.

it is a large sample that includes a large number of firms during a 15-year period, and the funds that hold their shares. I am cleaning the data to construct the fund_ratio variable for each 'Firm_ID' in each 'year'. actually, in most cases, each group only has one 'share_outstanding' value. but, in some cases, there is more than one 'share_holding' value in a group, I am trying to keep the most frequent 'share_holding' value in each group and delete the rest. If there is more than one most frequent 'share_holding' value in each group, then keep them together.

the rule is: in each group (same 'Firm_ID and same 'year'), remain the most frequent 'share_holding' value, and if there is more than one most frequent 'share_holding' value in each group, then keep them together.

I hope it is more clear. could you please give me some advice about how to get the result I need.

mkeintz · Posted 03-02-2022 12:13 PM

@Alexxxxxxx wrote:

Hi @ballardw , thanks for your reply.

it is a large sample that includes a large number of firms during a 15-year period, and the funds that hold their shares. I am cleaning the data to construct the fund_ratio variable for each 'Firm_ID' in each 'year'. actually, in most cases, each group only has one 'share_outstanding' value. but, in some cases, there is more than one 'share_holding' value in a group, I am trying to keep the most frequent 'share_holding' value in each group and delete the rest. If there is more than one most frequent 'share_holding' value in each group, then keep them together.

the rule is: in each group (same 'Firm_ID and same 'year'), remain the most frequent 'share_holding' value, and if there is more than one most frequent 'share_holding' value in each group, then keep them together.

I hope it is more clear. could you please give me some advice about how to get the result I need.

This rule is still incomplete. It addresses the following cases:

A single shares outstanding (SHROUT) value - output 1 record using that value.
Multiple SHROUTs, but with a single most frequent SHROUT. - output 1 record using that valuje
Multiple SHROUT values, all with the same frequency - output 1 record per unique SHROUT

But it doesn't address the case of multiple SHROUT values, with a tie for most frequent (case 3 above), but also with other less frequent SHROUTs. For instance, say you have 2 SHROUTs at 30m, 2 at 28m, and 1 each at 29m and 31m. What do you want in that case?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 03-02-2022 03:03 PM

First, you have an apparent data error in the 3rd group (Firm_ID=36110,year=2003). I think the fourth row should have share_outstanding=37000000 (you have 27000000). With that correction, this program should do the following: output 1 record for each share_oustanding value that has or matches the highest frequency:

data table1;
  infile cards dsd dlm=",";
  input  firm_id :$15.  year :8.  fundno :$15.  shares :8.  share_outstanding :8.  ;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000
36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000
36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 37000000
36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
run;


data want (keep=firm_id year total_shares_held share_outstanding ratio);
  array shr{20};
  array frq{20};
  do until (last.year);
    set table1;
    by firm_id year notsorted;
    total_shares_held=sum(total_shares_held,shares);
    s=whichn(share_outstanding,of shr{*});
    if s=0 then do; 
      n_freqs=sum(n_freqs,1);
      shr{n_freqs}=share_outstanding; 
      s=n_freqs;
    end;
    frq{s}=sum(1,frq{s});
  end;
  max_frq=max(of frq{*});  /*Maximum frequency */

  do until (max(of frq{*})^=max_frq);
    s=whichn(max_frq,of frq{*});    /* The Sth frequency equals the MAX_FRQ */
    share_outstanding=shr{s};       /* Get the corresponding denominator */
    ratio=total_shares_held/share_outstanding;
    output;
    frq{s}=.;                      /* Blank out the Sth frq, so that ties can be found */
    if n(of frq{*})=0 then leave;
  end;
run;

This allows for up to 20 records (actually 20 unique share_outstanding values) for each FIRM_ID/YEAR. If you expect more, then increase the size of the SHR and FRQ arrays.

How it works.
For each FIRM_ID/YEAR:

Read a record
See if the share_outstanding is new (in which case whichn(share_outstanding,of shr{*}) returns a zero, otherwise it returns the position in the array SHR in whicih share_outstanding already appears).
1. If new add the new value to the SHR array, and increment N_FREQS by 1. And set the position index s to N_FREQS
Add 1 to frq{s}, keeping track of frequencies of the corresponding shr{s}.
After all the records have been read, establish max_frq
For each instance in which frg{s}=max_frq:
1. Find the leftmost s, via whichn(max_frq,of frq{*})
2. get the corresponding shr{s}, calculate the ratio, and output.
3. Set frqs{s} to missing, so that the next leftmost qualifying frq can be located.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Catch up on SAS Innovate 2026

the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Re: the percentage of 'Firm_ID''s 'share_outstanding' held by all 'fundno' in a 'year'

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away