Hi I have a table where each row contains columns that have 'companies' and 'share owners' and the number of shares owned. Any owner can have multiple rows of companies that they have shares in, and many owners may have shares in many companies. Also companies can have shares in other companies as owners. This is a very large table with over 10 million rows. The Owner_id (say 1002) in the first row is the same entity as the Company_id in the second row, so therefore owner_id 1002 is linked to company_id 1001, which is linked to company 1003) and company_id 1002 is linked to owner_id 1011, which also has shares in company_id 1004, and so on. The 'have' data looks as follows: Company_id $4. Owner_id $4. Shares 8. Company_id Owner_id Shares 1001 1002 100 1002 1011 400 1003 1001 2500 1004 1011 1000 1005 1012 800 1006 1007 120 1007 1013 450 1008 1015 250 1009 1014 150 1010 1015 4500 1020 1001 25000 1021 1020 100000 What I would like to create is a table for input to another process that has, for each owner_id, each company they have shares in (as above) but also each company that they have shares in that has shares in other companies (secondary owners). I'm aware that with such a larger table the output could be in the billions of rows, so I was thinking to limit the number of rows by filtering the number of shares for secondary owners say to 1000 shares to allow, for example, linking ACME (1002) having 100 shares in Ford (1001), but not including Ford (1001) having 25,000 shares in McDonalds (1020), which may have 100,000 shares in Microsoft (1021) etc). So the output for ACME might be: Company_id Owner_id Shares 1001 1002 100 1002 1011 400 And so on. At present I'm testing assumptions on a table with 10,000 rows, but I am wondering if something like proc cluster, proc tree or proc distance of any similar procs might give me what I want (even if I have to eliminate the large corporate holdings myself). Hope this is clear enough for assistance, Thanks for any advice.
... View more