BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Max123
Obsidian | Level 7

My data is a list of edges (links) of a social network. It has only two columns: the first represents the focus node ID and the second represents the target node ID. The order, however, is not important, as the network is undirected. The link repetition is not important as the network should be binary. 

 

How can I create the adjacency matrix for an undirected network with elements equal to 1 if there is any number of links between node pairs, and zero otherwise?

(ultimately, I want to calculate the node centralities.)

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
/*
I think you should post an example to show your input dataset and output dataset.
Tom and I understand it in two different way.
The following code is according to Tom's logistic.
*/
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have;
 set _2005;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have union select linked_node_ID from have),
(select  node_ID as linked_node_ID from have union select linked_node_ID from have)
) as a natural left join (select distinct * from have) as b;
quit;
proc transpose data=want out=final_want(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;

View solution in original post

25 REPLIES 25
Ksharp
Super User
/*
You want this ?
If you have SAS/OR, the code could be more compact and less.
*/
data have;
infile cards ;
input from $  to $ ;
cards;
1     2
1     3
4     5
5     2
9     4
6     7
8     7
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;


proc sql;
create table want2 as
select x.*,coalesce(y.v,0) as v
from 
(select * from (select distinct node as node1 from want),(select distinct node as node2 from want)) as x
natural left join
(select a.node as node1,b.node as node2,1 as v from want as a,want as b where a.household=b.household and a.node ne b.node) as y
order by 1,2;
quit;

proc transpose data=want2 out=final_want(drop=_name_);
by  node1;
var v;
id node2;
run;

Ksharp_0-1686477641440.png

 

Tom
Super User Tom
Super User

If you want to use HASH() objects (which is useful if the NODE ids are not contiguous integers) then you can build the connectivity data into a single HASH that can be written to a dataset and then transposed.

data _null_;
* Declare hashs and hiters ;
  declare hash nodes(ordered:'yes');
   nodes.definekey('node');
   nodes.definedata('node');
   nodes.definedone();
  declare hash links(ordered:'yes');
   links.definekey('from','to');
   links.definedata('from','to','x');
   links.definedone();
  declare hiter ifrom('nodes');
  declare hiter ito('nodes');
  x=1;
* read in the data and populate the hashes ;
  do until(eof);
    set have end=eof;
    links.replace(key:to,key:from,data:to,data:from,data: 1);
    links.replace(key:from,key:to,data:from,data:to,data: 1);
    node=from; nodes.replace();
    node=to; nodes.replace();
  end;
* Iterate over the list of nodes and fill in the zeros ;
* Set diagonal to ones ;
  rc1 = ifrom.first();
  do while (rc1 = 0);
    from = node;
    rc2 = ito.first();
    do while (rc2 = 0);
      to = node;
      rc3=links.add(key:from,key:to,data:from,data:to,data:(from=to));
      rc2 = ito.next();
    end;
    rc1 = ifrom.next();
  end;
* write out the links ;
  links.output(dataset:'links');
  stop;
run;

* Transpose ;
proc transpose data=links out=want(drop=_name_) prefix=to;
  by from;
  id to;
  var x;
run;

So if you have this input data:

data have;
  input from $ to $ ;
cards;
1  FRED
1  3
4  5
5  FRED
;

You get this result

Tom_0-1686498043735.png

 

Ksharp
Super User

Tom,
I think it is all depend on what OP are looking for.
We understand it in two different way.
According to your logistic , OP's question is very very simple .
I think OP should post an example to show his input dataset and output dataset.
Max123
Obsidian | Level 7

Thank you for the response! Attached is an example of the data, and I can't figure out the SAS code for the adjacency matrices. The node IDs are randomly assigned (unique) integers and are important for later merging the centrality scores with other data. Any help is greatly appreciated!

Tom
Super User Tom
Super User

Not allowed to download XLSX files so if you want to share example data please just post a data step that makes the dataset.

 

But the real question is what are the statistics you want to calculate? 

Are you sure it is easier to calculate from a "matrix" instead of the data you already have?

Max123
Obsidian | Level 7

Ultimately, I want the centrality (eigenvector, degree, betweenness, closeness) scores for each node. I have done this in another software that takes only small data of ~100 nodes and it was easier to build the adj. matrix first and then calculate the centralities. Can that be skipped in SAS?

Node ID is a random integer (may as well be text) that's important for later merging with other data. Ex.:

 

data have; infile cards ; 

input from $ to $ ;
cards;
1   3
1   5
3   5
3   6
15 18
21 37
28 53
;
run;

 

 

Ksharp
Super User
/*
I think you should post an example to show your input dataset and output dataset.
Tom and I understand it in two different way.
The following code is according to Tom's logistic.
*/
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have;
 set _2005;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have union select linked_node_ID from have),
(select  node_ID as linked_node_ID from have union select linked_node_ID from have)
) as a natural left join (select distinct * from have) as b;
quit;
proc transpose data=want out=final_want(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;
Max123
Obsidian | Level 7
sorry to build on this... How can I multiply element-wise to adjacency matrices (e.g., final_want2005 and final_want2006)?
I'm trying this but it doesn't work:

proc iml ; reset noprint ;
A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ; c = A # B ;
run ;
Ksharp
Super User

This is a new question, should post it at IML forum.

I check your LOG.

 

56
57   A = final_want2005[1:2,4710:4711] ;
58   B = final_want2006[1:2,4710:4711] ;
ERROR: (execution) Invalid subscript or subscript out of range.

 operation : [ at line 58 column 19
 operands  : final_want2006, _TEM1001, *LIT1007, *LIT1008
final_want2006   4694 rows   4695 cols    (numeric)

_TEM1001      1 row       2 cols    (numeric)

final_want2006 only have 4695 columns unlike final_want2005 have 4711 columns, so you can't do this element-wise operator.

 

Here is my running code:

proc iml;
use final_want2005;
read all var _num_ into final_want2005;
close;
use final_want2006;
read all var _num_ into final_want2006;
close;

/*
x=ncol(final_want2005);
y=ncol(final_want2006);
print x y;
quit;
*/
A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ; 
c = A # B ;
quit;

 

Max123
Obsidian | Level 7
Thanks. The data was not complete for 2006 in the previous file - actually both matrices have the same dimensions given full data. The error it gave me from my code was about matrices not having assigned value.
Ksharp
Super User

OK.You need borrow some missing NODES from 2005.

 

 

proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have_2005;
 set _2005;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want_2005 as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have_2005 union select linked_node_ID from have_2005),
(select  node_ID as linked_node_ID from have_2005 union select linked_node_ID from have_2005)
) as a natural left join (select distinct * from have_2005) as b;
quit;
proc transpose data=want_2005 out=final_want_2005(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;






proc import datafile='c:\temp\book1.xlsx' out=_2006 dbms=xlsx replace;
sheet='2006';
run;
data have_2006;
 set _2006;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want_2006 as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have_2005 union select linked_node_ID from have_2005),  /*<-----*/
(select  node_ID as linked_node_ID from have_2005 union select linked_node_ID from have_2005)  /*<-----*/
) as a natural left join (select distinct * from have_2006) as b;
quit;
proc transpose data=want_2006 out=final_want_2006(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;



proc iml;
use final_want_2005;
read all var _num_ into final_want2005;
close;
use final_want_2006;
read all var _num_ into final_want2006;
close;



A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ; 
c = A # B ;
quit;
Max123
Obsidian | Level 7

thanks! I don't get it why it didn't work for me with this code and data...


proc iml ;
use sheet1 ;
read all var _num_ into layer1;
close;
use sheet2 ;
read all var _num_ into layer2;
close;

 

A = layer1[1:2,9:10] ;
B = layer2[1:2,9:10] ;
c = A # B ;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 2259 views
  • 15 likes
  • 5 in conversation