Hi
I have a table where each row contains columns that have 'companies' and 'share owners' and the number of shares owned. Any owner can have multiple rows of companies that they have shares in, and many owners may have shares in many companies. Also companies can have shares in other companies as owners. This is a very large table with over 10 million rows. The Owner_id (say 1002) in the first row is the same entity as the Company_id in the second row, so therefore owner_id 1002 is linked to company_id 1001, which is linked to company 1003) and company_id 1002 is linked to owner_id 1011, which also has shares in company_id 1004, and so on.
The 'have' data looks as follows:
Company_id $4. Owner_id $4. Shares 8.
Company_id | Owner_id | Shares |
1001 | 1002 | 100 |
1002 | 1011 | 400 |
1003 | 1001 | 2500 |
1004 | 1011 | 1000 |
1005 | 1012 | 800 |
1006 | 1007 | 120 |
1007 | 1013 | 450 |
1008 | 1015 | 250 |
1009 | 1014 | 150 |
1010 | 1015 | 4500 |
1020 | 1001 | 25000 |
1021 | 1020 | 100000 |
What I would like to create is a table for input to another process that has, for each owner_id, each company they have shares in (as above) but also each company that they have shares in that has shares in other companies (secondary owners). I'm aware that with such a larger table the output could be in the billions of rows, so I was thinking to limit the number of rows by filtering the number of shares for secondary owners say to 1000 shares to allow, for example, linking ACME (1002) having 100 shares in Ford (1001), but not including Ford (1001) having 25,000 shares in McDonalds (1020), which may have 100,000 shares in Microsoft (1021) etc).
So the output for ACME might be:
Company_id | Owner_id | Shares |
1001 | 1002 | 100 |
1002 | 1011 | 400 |
And so on.
At present I'm testing assumptions on a table with 10,000 rows, but I am wondering if something like proc cluster, proc tree or proc distance of any similar procs might give me what I want (even if I have to eliminate the large corporate holdings myself).
Hope this is clear enough for assistance,
Thanks for any advice.
Hello ,
I think ... if you search these SAS communities
... that you will find what you are looking for.
Try f.e. (Google search) :
BR, Koen
1)
Yes. But you need to have module SAS/OR . Here is an example:
data have(rename=(Company_id=from Owner_id=to)); infile cards expandtabs truncover; input Company_id Owner_id Shares; cards; 1001 1002 100 1002 1011 400 1003 1001 2500 1004 1011 1000 1005 1012 800 1006 1007 120 1007 1013 450 1008 1015 250 1009 1014 150 1010 1015 4500 1020 1001 25000 1021 1020 100000 ; proc optnet data_links=have out_nodes=want GRAPH_DIRECTION=UNDIRECTED; data_links_var from=from to=to; concomp; run;
2)
If you don't have SAS/OR ,try this one (But you need lots and lots of memory within your computer):
data have; infile cards expandtabs truncover; input Company_id Owner_id Shares; cards; 1001 1002 100 1002 1011 400 1003 1001 2500 1004 1011 1000 1005 1012 800 1006 1007 120 1007 1013 450 1008 1015 250 1009 1014 150 1010 1015 4500 1020 1001 25000 1021 1020 100000 ; data full; set have(rename=(Company_id=from Owner_id=to)) ; output; node=from; from=to; to=node; output; keep from to; run; data node; /*the ID you want to search*/ input node; cards; 1002 ; data want(keep=node household); declare hash ha(ordered:'a'); declare hiter hi('ha'); ha.definekey('count'); ha.definedata('last'); ha.definedone(); declare hash _ha(hashexp: 20); _ha.definekey('key'); _ha.definedone(); if 0 then set full; declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y'); from_to.definekey('from'); from_to.definedata('to'); from_to.definedone(); if 0 then set node; declare hash no(dataset:'node'); declare hiter hi_no('no'); no.definekey('node'); no.definedata('node'); no.definedone(); do while(hi_no.next()=0); household+1; output; count=1; key=node;_ha.add(); last=node;ha.add(); rc=hi.first(); do while(rc=0); from=last;rx=from_to.find(); do while(rx=0); key=to;ry=_ha.check(); if ry ne 0 then do; node=to;output;rr=no.remove(key:node); key=to;_ha.add(); count+1; last=to;ha.add(); end; rx=from_to.find_next(); end; rc=hi.next(); end; ha.clear();_ha.clear(); end; stop; run; data final_want; if _n_=1 then do; if 0 then set want(keep=node); declare hash h(dataset:'want(keep=node)'); h.definekey('node'); h.definedone(); end; set have; if (h.check(key:Company_id)=0 or h.check(key:Owner_id)=0) and Shares < 1000 ; drop node; run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.