Solved: Re: converting an edgelist dataset to an adjacency matrix, setting mis...

Max123 · Posted 06-10-2023 10:27 PM

My data is a list of edges (links) of a social network. It has only two columns: the first represents the focus node ID and the second represents the target node ID. The order, however, is not important, as the network is undirected. The link repetition is not important as the network should be binary.

How can I create the adjacency matrix for an undirected network with elements equal to 1 if there is any number of links between node pairs, and zero otherwise?

(ultimately, I want to calculate the node centralities.)

Ksharp · Posted 06-12-2023 07:47 AM

/*
I think you should post an example to show your input dataset and output dataset.
Tom and I understand it in two different way.
The following code is according to Tom's logistic.
*/
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have;
 set _2005;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have union select linked_node_ID from have),
(select  node_ID as linked_node_ID from have union select linked_node_ID from have)
) as a natural left join (select distinct * from have) as b;
quit;
proc transpose data=want out=final_want(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;

View solution in original post

Ksharp · Posted 06-11-2023 06:00 AM

/*
You want this ?
If you have SAS/OR, the code could be more compact and less.
*/
data have;
infile cards ;
input from $  to $ ;
cards;
1     2
1     3
4     5
5     2
9     4
6     7
8     7
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;


proc sql;
create table want2 as
select x.*,coalesce(y.v,0) as v
from 
(select * from (select distinct node as node1 from want),(select distinct node as node2 from want)) as x
natural left join
(select a.node as node1,b.node as node2,1 as v from want as a,want as b where a.household=b.household and a.node ne b.node) as y
order by 1,2;
quit;

proc transpose data=want2 out=final_want(drop=_name_);
by  node1;
var v;
id node2;
run;

Tom · Posted 06-11-2023 11:40 AM

If you want to use HASH() objects (which is useful if the NODE ids are not contiguous integers) then you can build the connectivity data into a single HASH that can be written to a dataset and then transposed.

data _null_;
* Declare hashs and hiters ;
  declare hash nodes(ordered:'yes');
   nodes.definekey('node');
   nodes.definedata('node');
   nodes.definedone();
  declare hash links(ordered:'yes');
   links.definekey('from','to');
   links.definedata('from','to','x');
   links.definedone();
  declare hiter ifrom('nodes');
  declare hiter ito('nodes');
  x=1;
* read in the data and populate the hashes ;
  do until(eof);
    set have end=eof;
    links.replace(key:to,key:from,data:to,data:from,data: 1);
    links.replace(key:from,key:to,data:from,data:to,data: 1);
    node=from; nodes.replace();
    node=to; nodes.replace();
  end;
* Iterate over the list of nodes and fill in the zeros ;
* Set diagonal to ones ;
  rc1 = ifrom.first();
  do while (rc1 = 0);
    from = node;
    rc2 = ito.first();
    do while (rc2 = 0);
      to = node;
      rc3=links.add(key:from,key:to,data:from,data:to,data:(from=to));
      rc2 = ito.next();
    end;
    rc1 = ifrom.next();
  end;
* write out the links ;
  links.output(dataset:'links');
  stop;
run;

* Transpose ;
proc transpose data=links out=want(drop=_name_) prefix=to;
  by from;
  id to;
  var x;
run;

So if you have this input data:

data have;
  input from $ to $ ;
cards;
1  FRED
1  3
4  5
5  FRED
;

You get this result

Ksharp · Posted 06-12-2023 07:45 AM

Tom,
I think it is all depend on what OP are looking for.
We understand it in two different way.
According to your logistic , OP's question is very very simple .
I think OP should post an example to show his input dataset and output dataset.

Max123 · Posted 06-11-2023 05:34 PM

Thank you for the response! Attached is an example of the data, and I can't figure out the SAS code for the adjacency matrices. The node IDs are randomly assigned (unique) integers and are important for later merging the centrality scores with other data. Any help is greatly appreciated!

Tom · Posted 06-11-2023 06:25 PM

Not allowed to download XLSX files so if you want to share example data please just post a data step that makes the dataset.

But the real question is what are the statistics you want to calculate?

Are you sure it is easier to calculate from a "matrix" instead of the data you already have?

Max123 · Posted 06-11-2023 06:46 PM

Ultimately, I want the centrality (eigenvector, degree, betweenness, closeness) scores for each node. I have done this in another software that takes only small data of ~100 nodes and it was easier to build the adj. matrix first and then calculate the centralities. Can that be skipped in SAS?

Node ID is a random integer (may as well be text) that's important for later merging with other data. Ex.:

data have; infile cards ;

input from $ to $ ;
cards;
1 3
1 5
3 5
3 6
15 18
21 37
28 53
;
run;

Max123 · Posted 06-11-2023 06:54 PM

I have ~5,000 nodes.

Ksharp · Posted 06-12-2023 07:47 AM

/*
I think you should post an example to show your input dataset and output dataset.
Tom and I understand it in two different way.
The following code is according to Tom's logistic.
*/
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have;
 set _2005;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have union select linked_node_ID from have),
(select  node_ID as linked_node_ID from have union select linked_node_ID from have)
) as a natural left join (select distinct * from have) as b;
quit;
proc transpose data=want out=final_want(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;

Max123 · Posted 06-12-2023 06:03 PM

Thank you both so much!!!

Max123 · Posted 06-12-2023 11:47 PM

sorry to build on this... How can I multiply element-wise to adjacency matrices (e.g., final_want2005 and final_want2006)?
I'm trying this but it doesn't work:

proc iml ; reset noprint ;
A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ; c = A # B ;
run ;

Ksharp · Posted 06-13-2023 08:19 AM

This is a new question, should post it at IML forum.

I check your LOG.

56
57   A = final_want2005[1:2,4710:4711] ;
58   B = final_want2006[1:2,4710:4711] ;
ERROR: (execution) Invalid subscript or subscript out of range.

 operation : [ at line 58 column 19
 operands  : final_want2006, _TEM1001, *LIT1007, *LIT1008
final_want2006   4694 rows   4695 cols    (numeric)

_TEM1001      1 row       2 cols    (numeric)

final_want2006 only have 4695 columns unlike final_want2005 have 4711 columns, so you can't do this element-wise operator.

Here is my running code:

proc iml;
use final_want2005;
read all var _num_ into final_want2005;
close;
use final_want2006;
read all var _num_ into final_want2006;
close;

/*
x=ncol(final_want2005);
y=ncol(final_want2006);
print x y;
quit;
*/
A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ; 
c = A # B ;
quit;

Max123 · Posted 06-13-2023 08:37 AM

Thanks. The data was not complete for 2006 in the previous file - actually both matrices have the same dimensions given full data. The error it gave me from my code was about matrices not having assigned value.

Ksharp · Posted 06-13-2023 09:05 AM

OK.You need borrow some missing NODES from 2005.

proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have_2005;
 set _2005;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want_2005 as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have_2005 union select linked_node_ID from have_2005),
(select  node_ID as linked_node_ID from have_2005 union select linked_node_ID from have_2005)
) as a natural left join (select distinct * from have_2005) as b;
quit;
proc transpose data=want_2005 out=final_want_2005(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;






proc import datafile='c:\temp\book1.xlsx' out=_2006 dbms=xlsx replace;
sheet='2006';
run;
data have_2006;
 set _2006;
 v=1;
 output;
 temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
 drop temp;
run;
proc sql;
create table want_2006 as
select a.*,coalesce(b.v,0) as v
 from  (
select * from
(select  node_ID as node_ID        from have_2005 union select linked_node_ID from have_2005),  /*<-----*/
(select  node_ID as linked_node_ID from have_2005 union select linked_node_ID from have_2005)  /*<-----*/
) as a natural left join (select distinct * from have_2006) as b;
quit;
proc transpose data=want_2006 out=final_want_2006(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;



proc iml;
use final_want_2005;
read all var _num_ into final_want2005;
close;
use final_want_2006;
read all var _num_ into final_want2006;
close;



A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ; 
c = A # B ;
quit;

Max123 · Posted 06-14-2023 09:59 PM

thanks! I don't get it why it didn't work for me with this code and data...

proc iml ;
use sheet1 ;
read all var _num_ into layer1;
close;
use sheet2 ;
read all var _num_ into layer2;
close;

A = layer1[1:2,9:10] ;
B = layer2[1:2,9:10] ;
c = A # B ;
quit;

converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Re: converting an edgelist dataset to an adjacency matrix, setting missing values as 0

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!