BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9

Hello all,

 

I am trying to calculate the percentage of each firm (Firm_ID)’s outstanding shares (share_outstanding) held by funds (‘fundno’ for fund id; and ‘shares’ is the number of firm’s shares held by a fund in a year) in each year (year).

 

My problem is, although in most cases, each group (i.e., each ‘Firm_ID’ in each ‘year’) only has one ‘share_outstanding’ value, in some cases a group includes more than one ‘share_outstanding’ value.

 

By following Table1 as an example,

data table1;
infile cards dsd dlm=",";
input
firm_id :$15.
year :8.
fundno :$15.
shares :8.
share_outstanding :8.
;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000

36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000

36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 27000000

36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
;;;;
run;

There are four groups (same Firm_iD, and same year) in Table 1. By using the first group as an example,

There are three ‘32000000’ and one ‘27000000’ in this group (i.e., Firm_ID=36110 and year=2002), and then

  1. I assume the outstanding share of Firm_ID ‘36110’ in ‘year’ 2002 is ‘32000000’.
  2. I sum the number of shares held by funds in this group, which is 150040 (=18000+92000+4740+35300)
  3. I divide the number of shares held by funds in this group (150040) by the firm’s outstanding shares (32000000) to get a new variable ‘fund_ratio‘, which equals 0.00468875.

In the end, I expect to get 

Firm_ID

year

fund_ratio

36110

2002

0.00468875

 

Following the same rule, in the second group (i.e., Firm_ID=36111 and year=2002), I have three ‘32000000’ and two ‘30000000’. So, the fund_ratio=0.0221796875, which is (26700+543350+60000+20000+59700)/ 32000000. I expect to get 

Firm_ID

year

fund_ratio

36111

2002

0.0221796875

 

In the third (Firm_ID=36110,year=2003), there are two ‘32000000’ and two ‘37000000’. In this case, I keep them both. I expect to get 

Firm_ID

year

fund_ratio

36110

2003

0.00468875

36110

2003

0.00405514

As

0.00468875= (18000+92000+4740+35300)/ 32000000

0.00405514= (18000+92000+4740+35300)/ 37000000

 

fourth group (Firm_ID=36111, year=2003), I expect to get 

Firm_ID

year

fund_ratio

36111

2003

0.020314063

36111

2003

0.020969355

36111

2003

0.019698485

36111

2003

0.021668333

As

0.020314063 = (26700+543350+60000+ 20000)/32000000

0.020969355= (26700+543350+60000+ 20000)/ 31000000

0.019698485= (26700+543350+60000+ 20000)/ 33000000

0.021668333= (26700+543350+60000+ 20000)/ 30000000

 

I hope it is clear.

Many thanks in advance for any advice.

 

4 REPLIES 4
ballardw
Super User

You need to provide a Rule, or rules, for determining what the value of the denominator is for each group. Examples don't quite cover things when we get some other group that has 5 different "shares outstanding" values with 5 records for a given year.

 

The information you have provided would allow exactly 4 groups to be calculated for two companies and two years. Do you have more years? More companies? If so, we do not have sufficient information to handle those other cases.

 


@Alexxxxxxx wrote:

Hello all,

 

I am trying to calculate the percentage of each firm (Firm_ID)’s outstanding shares (share_outstanding) held by funds (‘fundno’ for fund id; and ‘shares’ is the number of firm’s shares held by a fund in a year) in each year (year).

 

My problem is, although in most cases, each group (i.e., each ‘Firm_ID’ in each ‘year’) only has one ‘share_outstanding’ value, in some cases a group includes more than one ‘share_outstanding’ value.

 

By following Table1 as an example,

data table1;
infile cards dsd dlm=",";
input
firm_id :$15.
year :8.
fundno :$15.
shares :8.
share_outstanding :8.
;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000

36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000

36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 27000000

36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
;;;;
run;

There are four groups (same Firm_iD, and same year) in Table 1. By using the first group as an example,

There are three ‘32000000’ and one ‘27000000’ in this group (i.e., Firm_ID=36110 and year=2002), and then

  1. I assume the outstanding share of Firm_ID ‘36110’ in ‘year’ 2002 is ‘32000000’.
  2. I sum the number of shares held by funds in this group, which is 150040 (=18000+92000+4740+35300)
  3. I divide the number of shares held by funds in this group (150040) by the firm’s outstanding shares (32000000) to get a new variable ‘fund_ratio‘, which equals 0.00468875.

In the end, I expect to get 

Firm_ID

year

fund_ratio

36110

2002

0.00468875

 

Following the same rule, in the second group (i.e., Firm_ID=36111 and year=2002), I have three ‘32000000’ and two ‘30000000’. So, the fund_ratio=0.0221796875, which is (26700+543350+60000+20000+59700)/ 32000000. I expect to get 

Firm_ID

year

fund_ratio

36111

2002

0.0221796875

 

In the third (Firm_ID=36110,year=2003), there are two ‘32000000’ and two ‘37000000’. In this case, I keep them both. I expect to get 

Firm_ID

year

fund_ratio

36110

2003

0.00468875

36110

2003

0.00405514

As

0.00468875= (18000+92000+4740+35300)/ 32000000

0.00405514= (18000+92000+4740+35300)/ 37000000

 

fourth group (Firm_ID=36111, year=2003), I expect to get 

Firm_ID

year

fund_ratio

36111

2003

0.020314063

36111

2003

0.020969355

36111

2003

0.019698485

36111

2003

0.021668333

As

0.020314063 = (26700+543350+60000+ 20000)/32000000

0.020969355= (26700+543350+60000+ 20000)/ 31000000

0.019698485= (26700+543350+60000+ 20000)/ 33000000

0.021668333= (26700+543350+60000+ 20000)/ 30000000

 

I hope it is clear.

Many thanks in advance for any advice.

 


 

 

Alexxxxxxx
Pyrite | Level 9

Hi @ballardw , thanks for your reply.

 

it is a large sample that includes a large number of firms during a 15-year period, and the funds that hold their shares. I am cleaning the data to construct the fund_ratio variable for each 'Firm_ID' in each 'year'. actually, in most cases, each group only has one 'share_outstanding' value. but, in some cases, there is more than one 'share_holding' value in a group, I am trying to keep the most frequent 'share_holding' value in each group and delete the rest. If there is more than one most frequent 'share_holding' value in each group, then keep them together.  

 

the rule is: in each group (same 'Firm_ID and same 'year'), remain the most frequent 'share_holding' value, and if there is more than one most frequent 'share_holding' value in each group, then keep them together.   

 

I hope it is more clear. could you please give me some advice about how to get the result I need.

mkeintz
PROC Star

@Alexxxxxxx wrote:

Hi @ballardw , thanks for your reply.

 

it is a large sample that includes a large number of firms during a 15-year period, and the funds that hold their shares. I am cleaning the data to construct the fund_ratio variable for each 'Firm_ID' in each 'year'. actually, in most cases, each group only has one 'share_outstanding' value. but, in some cases, there is more than one 'share_holding' value in a group, I am trying to keep the most frequent 'share_holding' value in each group and delete the rest. If there is more than one most frequent 'share_holding' value in each group, then keep them together.  

 

the rule is: in each group (same 'Firm_ID and same 'year'), remain the most frequent 'share_holding' value, and if there is more than one most frequent 'share_holding' value in each group, then keep them together.   

 

I hope it is more clear. could you please give me some advice about how to get the result I need.


This rule is still incomplete.  It addresses the following cases:

  1. A single shares outstanding (SHROUT) value - output 1 record using that value.
  2. Multiple SHROUTs, but with a single most frequent SHROUT. - output 1 record using that valuje
  3. Multiple SHROUT values, all with the same frequency - output 1 record per unique SHROUT

But it doesn't address the case of multiple SHROUT values, with a tie for most frequent (case 3 above), but also with other less frequent SHROUTs.  For instance, say you have 2 SHROUTs at 30m, 2 at 28m, and 1 each at 29m and 31m.  What do you want in that case?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

First, you have an apparent data error in the 3rd group (Firm_ID=36110,year=2003).  I think the fourth row should have share_outstanding=37000000  (you have 27000000).  With that correction, this program should do the following:  output 1 record for each share_oustanding value that has or matches the highest frequency:

 

data table1;
  infile cards dsd dlm=",";
  input  firm_id :$15.  year :8.  fundno :$15.  shares :8.  share_outstanding :8.  ;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000
36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000
36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 37000000
36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
run;


data want (keep=firm_id year total_shares_held share_outstanding ratio);
  array shr{20};
  array frq{20};
  do until (last.year);
    set table1;
    by firm_id year notsorted;
    total_shares_held=sum(total_shares_held,shares);
    s=whichn(share_outstanding,of shr{*});
    if s=0 then do; 
      n_freqs=sum(n_freqs,1);
      shr{n_freqs}=share_outstanding; 
      s=n_freqs;
    end;
    frq{s}=sum(1,frq{s});
  end;
  max_frq=max(of frq{*});  /*Maximum frequency */

  do until (max(of frq{*})^=max_frq);
    s=whichn(max_frq,of frq{*});    /* The Sth frequency equals the MAX_FRQ */
    share_outstanding=shr{s};       /* Get the corresponding denominator */
    ratio=total_shares_held/share_outstanding;
    output;
    frq{s}=.;                      /* Blank out the Sth frq, so that ties can be found */
    if n(of frq{*})=0 then leave;
  end;
run;

 

This allows for up to 20 records (actually 20 unique share_outstanding values) for each FIRM_ID/YEAR.  If you expect more, then increase the size of the SHR and FRQ arrays.

 

How it works.
For each FIRM_ID/YEAR:

  1. Read a record
  2. See if the share_outstanding is new (in which case  whichn(share_outstanding,of shr{*}) returns a zero, otherwise it returns the position in the array SHR in whicih share_outstanding already appears).
    1. If new add the new value to the SHR array, and increment N_FREQS by 1.  And set the position index s to N_FREQS
  3. Add 1 to frq{s}, keeping track of frequencies of the corresponding shr{s}.
  4. After all the records have been read, establish max_frq
  5. For each instance in which frg{s}=max_frq:
    1. Find the leftmost s, via whichn(max_frq,of frq{*})
    2. get the corresponding shr{s}, calculate the ratio, and output.
    3. Set frqs{s} to missing, so that the next leftmost qualifying frq can be located.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 287 views
  • 1 like
  • 3 in conversation