Hello all,
I am trying to calculate the percentage of each firm (Firm_ID)’s outstanding shares (share_outstanding) held by funds (‘fundno’ for fund id; and ‘shares’ is the number of firm’s shares held by a fund in a year) in each year (year).
My problem is, although in most cases, each group (i.e., each ‘Firm_ID’ in each ‘year’) only has one ‘share_outstanding’ value, in some cases a group includes more than one ‘share_outstanding’ value.
By following Table1 as an example,
data table1; infile cards dsd dlm=","; input firm_id :$15. year :8. fundno :$15. shares :8. share_outstanding :8. ; cards; 36110,2002, 14086, 18000, 32000000 36110,2002, 15056, 92000, 32000000 36110,2002, 15058, 4740, 32000000 36110,2002, 16500, 35300, 27000000 36111,2002, 1903, 26700, 32000000 36111,2002, 1909, 543350, 32000000 36111,2002, 19658, 60000, 30000000 36111,2002, 2021, 20000, 30000000 36111,2002, 2034, 59700, 32000000 36110,2003, 14086, 18000, 32000000 36110,2003, 15056, 92000, 32000000 36110,2003, 15058, 4740, 37000000 36110,2003, 16500, 35300, 27000000 36111,2003, 1903, 26700, 32000000 36111,2003, 1909, 543350, 31000000 36111,2003, 19658, 60000, 33000000 36111,2003, 2021, 20000, 30000000 ;;;; run;
There are four groups (same Firm_iD, and same year) in Table 1. By using the first group as an example,
There are three ‘32000000’ and one ‘27000000’ in this group (i.e., Firm_ID=36110 and year=2002), and then
In the end, I expect to get
Firm_ID | year | fund_ratio |
36110 | 2002 | 0.00468875 |
Following the same rule, in the second group (i.e., Firm_ID=36111 and year=2002), I have three ‘32000000’ and two ‘30000000’. So, the fund_ratio=0.0221796875, which is (26700+543350+60000+20000+59700)/ 32000000. I expect to get
Firm_ID | year | fund_ratio |
36111 | 2002 | 0.0221796875 |
In the third (Firm_ID=36110,year=2003), there are two ‘32000000’ and two ‘37000000’. In this case, I keep them both. I expect to get
Firm_ID | year | fund_ratio |
36110 | 2003 | 0.00468875 |
36110 | 2003 | 0.00405514 |
As
0.00468875= (18000+92000+4740+35300)/ 32000000
0.00405514= (18000+92000+4740+35300)/ 37000000
fourth group (Firm_ID=36111, year=2003), I expect to get
Firm_ID | year | fund_ratio |
36111 | 2003 | 0.020314063 |
36111 | 2003 | 0.020969355 |
36111 | 2003 | 0.019698485 |
36111 | 2003 | 0.021668333 |
As
0.020314063 = (26700+543350+60000+ 20000)/32000000
0.020969355= (26700+543350+60000+ 20000)/ 31000000
0.019698485= (26700+543350+60000+ 20000)/ 33000000
0.021668333= (26700+543350+60000+ 20000)/ 30000000
I hope it is clear.
Many thanks in advance for any advice.
You need to provide a Rule, or rules, for determining what the value of the denominator is for each group. Examples don't quite cover things when we get some other group that has 5 different "shares outstanding" values with 5 records for a given year.
The information you have provided would allow exactly 4 groups to be calculated for two companies and two years. Do you have more years? More companies? If so, we do not have sufficient information to handle those other cases.
@Alexxxxxxx wrote:
Hello all,
I am trying to calculate the percentage of each firm (Firm_ID)’s outstanding shares (share_outstanding) held by funds (‘fundno’ for fund id; and ‘shares’ is the number of firm’s shares held by a fund in a year) in each year (year).
My problem is, although in most cases, each group (i.e., each ‘Firm_ID’ in each ‘year’) only has one ‘share_outstanding’ value, in some cases a group includes more than one ‘share_outstanding’ value.
By following Table1 as an example,
data table1; infile cards dsd dlm=","; input firm_id :$15. year :8. fundno :$15. shares :8. share_outstanding :8. ; cards; 36110,2002, 14086, 18000, 32000000 36110,2002, 15056, 92000, 32000000 36110,2002, 15058, 4740, 32000000 36110,2002, 16500, 35300, 27000000 36111,2002, 1903, 26700, 32000000 36111,2002, 1909, 543350, 32000000 36111,2002, 19658, 60000, 30000000 36111,2002, 2021, 20000, 30000000 36111,2002, 2034, 59700, 32000000 36110,2003, 14086, 18000, 32000000 36110,2003, 15056, 92000, 32000000 36110,2003, 15058, 4740, 37000000 36110,2003, 16500, 35300, 27000000 36111,2003, 1903, 26700, 32000000 36111,2003, 1909, 543350, 31000000 36111,2003, 19658, 60000, 33000000 36111,2003, 2021, 20000, 30000000 ;;;; run;There are four groups (same Firm_iD, and same year) in Table 1. By using the first group as an example,
There are three ‘32000000’ and one ‘27000000’ in this group (i.e., Firm_ID=36110 and year=2002), and then
- I assume the outstanding share of Firm_ID ‘36110’ in ‘year’ 2002 is ‘32000000’.
- I sum the number of shares held by funds in this group, which is 150040 (=18000+92000+4740+35300)
- I divide the number of shares held by funds in this group (150040) by the firm’s outstanding shares (32000000) to get a new variable ‘fund_ratio‘, which equals 0.00468875.
In the end, I expect to get
Firm_ID
year
fund_ratio
36110
2002
0.00468875
Following the same rule, in the second group (i.e., Firm_ID=36111 and year=2002), I have three ‘32000000’ and two ‘30000000’. So, the fund_ratio=0.0221796875, which is (26700+543350+60000+20000+59700)/ 32000000. I expect to get
Firm_ID
year
fund_ratio
36111
2002
0.0221796875
In the third (Firm_ID=36110,year=2003), there are two ‘32000000’ and two ‘37000000’. In this case, I keep them both. I expect to get
Firm_ID
year
fund_ratio
36110
2003
0.00468875
36110
2003
0.00405514
As
0.00468875= (18000+92000+4740+35300)/ 32000000
0.00405514= (18000+92000+4740+35300)/ 37000000
fourth group (Firm_ID=36111, year=2003), I expect to get
Firm_ID
year
fund_ratio
36111
2003
0.020314063
36111
2003
0.020969355
36111
2003
0.019698485
36111
2003
0.021668333
As
0.020314063 = (26700+543350+60000+ 20000)/32000000
0.020969355= (26700+543350+60000+ 20000)/ 31000000
0.019698485= (26700+543350+60000+ 20000)/ 33000000
0.021668333= (26700+543350+60000+ 20000)/ 30000000
I hope it is clear.
Many thanks in advance for any advice.
Hi @ballardw , thanks for your reply.
it is a large sample that includes a large number of firms during a 15-year period, and the funds that hold their shares. I am cleaning the data to construct the fund_ratio variable for each 'Firm_ID' in each 'year'. actually, in most cases, each group only has one 'share_outstanding' value. but, in some cases, there is more than one 'share_holding' value in a group, I am trying to keep the most frequent 'share_holding' value in each group and delete the rest. If there is more than one most frequent 'share_holding' value in each group, then keep them together.
the rule is: in each group (same 'Firm_ID and same 'year'), remain the most frequent 'share_holding' value, and if there is more than one most frequent 'share_holding' value in each group, then keep them together.
I hope it is more clear. could you please give me some advice about how to get the result I need.
@Alexxxxxxx wrote:
Hi @ballardw , thanks for your reply.
it is a large sample that includes a large number of firms during a 15-year period, and the funds that hold their shares. I am cleaning the data to construct the fund_ratio variable for each 'Firm_ID' in each 'year'. actually, in most cases, each group only has one 'share_outstanding' value. but, in some cases, there is more than one 'share_holding' value in a group, I am trying to keep the most frequent 'share_holding' value in each group and delete the rest. If there is more than one most frequent 'share_holding' value in each group, then keep them together.
the rule is: in each group (same 'Firm_ID and same 'year'), remain the most frequent 'share_holding' value, and if there is more than one most frequent 'share_holding' value in each group, then keep them together.
I hope it is more clear. could you please give me some advice about how to get the result I need.
This rule is still incomplete. It addresses the following cases:
But it doesn't address the case of multiple SHROUT values, with a tie for most frequent (case 3 above), but also with other less frequent SHROUTs. For instance, say you have 2 SHROUTs at 30m, 2 at 28m, and 1 each at 29m and 31m. What do you want in that case?
First, you have an apparent data error in the 3rd group (Firm_ID=36110,year=2003). I think the fourth row should have share_outstanding=37000000 (you have 27000000). With that correction, this program should do the following: output 1 record for each share_oustanding value that has or matches the highest frequency:
data table1;
infile cards dsd dlm=",";
input firm_id :$15. year :8. fundno :$15. shares :8. share_outstanding :8. ;
cards;
36110,2002, 14086, 18000, 32000000
36110,2002, 15056, 92000, 32000000
36110,2002, 15058, 4740, 32000000
36110,2002, 16500, 35300, 27000000
36111,2002, 1903, 26700, 32000000
36111,2002, 1909, 543350, 32000000
36111,2002, 19658, 60000, 30000000
36111,2002, 2021, 20000, 30000000
36111,2002, 2034, 59700, 32000000
36110,2003, 14086, 18000, 32000000
36110,2003, 15056, 92000, 32000000
36110,2003, 15058, 4740, 37000000
36110,2003, 16500, 35300, 37000000
36111,2003, 1903, 26700, 32000000
36111,2003, 1909, 543350, 31000000
36111,2003, 19658, 60000, 33000000
36111,2003, 2021, 20000, 30000000
run;
data want (keep=firm_id year total_shares_held share_outstanding ratio);
array shr{20};
array frq{20};
do until (last.year);
set table1;
by firm_id year notsorted;
total_shares_held=sum(total_shares_held,shares);
s=whichn(share_outstanding,of shr{*});
if s=0 then do;
n_freqs=sum(n_freqs,1);
shr{n_freqs}=share_outstanding;
s=n_freqs;
end;
frq{s}=sum(1,frq{s});
end;
max_frq=max(of frq{*}); /*Maximum frequency */
do until (max(of frq{*})^=max_frq);
s=whichn(max_frq,of frq{*}); /* The Sth frequency equals the MAX_FRQ */
share_outstanding=shr{s}; /* Get the corresponding denominator */
ratio=total_shares_held/share_outstanding;
output;
frq{s}=.; /* Blank out the Sth frq, so that ties can be found */
if n(of frq{*})=0 then leave;
end;
run;
This allows for up to 20 records (actually 20 unique share_outstanding values) for each FIRM_ID/YEAR. If you expect more, then increase the size of the SHR and FRQ arrays.
How it works.
For each FIRM_ID/YEAR:
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.