BookmarkSubscribeRSS Feed
gordononline
Fluorite | Level 6

Hi

 

I have a table where each row contains columns that have 'companies' and 'share owners' and the number of shares owned.  Any owner can have multiple rows of companies that they have shares in, and many owners may have shares in many companies.  Also companies can have shares in other companies as owners. This is a very large table with over 10 million rows.  The Owner_id (say 1002) in the first row is the same entity as the Company_id in the second row, so therefore owner_id 1002 is linked to company_id  1001, which is linked to company 1003) and company_id 1002 is linked to owner_id 1011, which also has shares in company_id 1004, and so on.

 

The 'have' data looks as follows:

Company_id $4. Owner_id $4. Shares 8.

Company_idOwner_idShares
10011002100
10021011400
100310012500
100410111000
10051012800
10061007120
10071013450
10081015250
10091014150
101010154500
1020100125000
10211020100000

 

What I would like to create  is a table for input to another process that has, for each owner_id,  each company they have shares in (as above) but also each company that they have shares in that has shares in other companies (secondary owners).  I'm aware that with such a larger table the output could be in the billions of rows, so I was thinking to limit the number of rows by filtering the number of shares for secondary owners say to 1000 shares to allow, for example, linking ACME (1002)  having 100 shares in Ford (1001), but not including Ford (1001) having 25,000 shares in McDonalds (1020), which may have 100,000 shares in Microsoft (1021) etc). 

 

So the output for ACME might be:

Company_idOwner_idShares
10011002100
10021011400

 

And so on.

 

At present I'm testing assumptions on a table with 10,000 rows, but I am wondering if something like proc cluster, proc tree or proc distance of any similar procs might give me what I want (even if I have to eliminate the large corporate holdings myself).

 

Hope this is clear enough for assistance, 

 

Thanks for any advice.

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello ,

 

I think ... if you search these SAS communities

  • for grandparent-parent-child hierarchy or
  • for grandparent-parent-child relationships or
  • for ancestor - descendent hierarchy

... that you will find what you are looking for.

 

Try f.e. (Google search) :

  • grandparent parent child hierarchy site:communities.sas.com
    or 
  • parent child hierarchy site:communities.sas.com


BR, Koen

Ksharp
Super User

1)

Yes. But you need to have module SAS/OR . Here is an example:

data have(rename=(Company_id=from Owner_id=to));
infile cards expandtabs truncover;
input Company_id	Owner_id	Shares;
cards;
1001	1002	100
1002	1011	400
1003	1001	2500
1004	1011	1000
1005	1012	800
1006	1007	120
1007	1013	450
1008	1015	250
1009	1014	150
1010	1015	4500
1020	1001	25000
1021	1020	100000
;


proc optnet data_links=have out_nodes=want GRAPH_DIRECTION=UNDIRECTED;
data_links_var from=from to=to;
concomp;
run;

 

 

2)

If you don't have SAS/OR ,try this one (But you need lots and lots of memory within your computer):

data have;
infile cards expandtabs truncover;
input Company_id	Owner_id	Shares;
cards;
1001	1002	100
1002	1011	400
1003	1001	2500
1004	1011	1000
1005	1012	800
1006	1007	120
1007	1013	450
1008	1015	250
1009	1014	150
1010	1015	4500
1020	1001	25000
1021	1020	100000
;


data full;
  set have(rename=(Company_id=from Owner_id=to)) ;
  output;
  node=from; from=to; to=node;
  output;
  keep from to;
run;

data node; /*the ID you want to search*/
input node;
cards;
1002
;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;


data final_want;
 if _n_=1 then do;
  if 0 then set want(keep=node);
  declare hash h(dataset:'want(keep=node)');
  h.definekey('node');
  h.definedone();
 end;
set have;
if (h.check(key:Company_id)=0 or h.check(key:Owner_id)=0) and Shares < 1000 ;
drop node;
run;

 

RichardAD
Quartz | Level 8
If you know the total shares for each company you can compute a fraction factor for each tier of secondary, tertiary and n-ary ownership.
mkeintz
PROC Star
If there are multiple paths from an owner id to a company id, how many observations do you want in the output dataset? One obs, or one obs per path? And if the latter, do you also want measures of the "length" of each path? There are, of course, other possibilities if you have other info available, such as total shares of each company being publicly traded.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1659 views
  • 2 likes
  • 5 in conversation