My data is a list of edges (links) of a social network. It has only two columns: the first represents the focus node ID and the second represents the target node ID. The order, however, is not important, as the network is undirected. The link repetition is not important as the network should be binary.
How can I create the adjacency matrix for an undirected network with elements equal to 1 if there is any number of links between node pairs, and zero otherwise?
(ultimately, I want to calculate the node centralities.)
/*
I think you should post an example to show your input dataset and output dataset.
Tom and I understand it in two different way.
The following code is according to Tom's logistic.
*/
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have;
set _2005;
v=1;
output;
temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
drop temp;
run;
proc sql;
create table want as
select a.*,coalesce(b.v,0) as v
from (
select * from
(select node_ID as node_ID from have union select linked_node_ID from have),
(select node_ID as linked_node_ID from have union select linked_node_ID from have)
) as a natural left join (select distinct * from have) as b;
quit;
proc transpose data=want out=final_want(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;
/*
You want this ?
If you have SAS/OR, the code could be more compact and less.
*/
data have;
infile cards ;
input from $ to $ ;
cards;
1 2
1 3
4 5
5 2
9 4
6 7
8 7
;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table want2 as
select x.*,coalesce(y.v,0) as v
from
(select * from (select distinct node as node1 from want),(select distinct node as node2 from want)) as x
natural left join
(select a.node as node1,b.node as node2,1 as v from want as a,want as b where a.household=b.household and a.node ne b.node) as y
order by 1,2;
quit;
proc transpose data=want2 out=final_want(drop=_name_);
by node1;
var v;
id node2;
run;
If you want to use HASH() objects (which is useful if the NODE ids are not contiguous integers) then you can build the connectivity data into a single HASH that can be written to a dataset and then transposed.
data _null_;
* Declare hashs and hiters ;
declare hash nodes(ordered:'yes');
nodes.definekey('node');
nodes.definedata('node');
nodes.definedone();
declare hash links(ordered:'yes');
links.definekey('from','to');
links.definedata('from','to','x');
links.definedone();
declare hiter ifrom('nodes');
declare hiter ito('nodes');
x=1;
* read in the data and populate the hashes ;
do until(eof);
set have end=eof;
links.replace(key:to,key:from,data:to,data:from,data: 1);
links.replace(key:from,key:to,data:from,data:to,data: 1);
node=from; nodes.replace();
node=to; nodes.replace();
end;
* Iterate over the list of nodes and fill in the zeros ;
* Set diagonal to ones ;
rc1 = ifrom.first();
do while (rc1 = 0);
from = node;
rc2 = ito.first();
do while (rc2 = 0);
to = node;
rc3=links.add(key:from,key:to,data:from,data:to,data:(from=to));
rc2 = ito.next();
end;
rc1 = ifrom.next();
end;
* write out the links ;
links.output(dataset:'links');
stop;
run;
* Transpose ;
proc transpose data=links out=want(drop=_name_) prefix=to;
by from;
id to;
var x;
run;
So if you have this input data:
data have;
input from $ to $ ;
cards;
1 FRED
1 3
4 5
5 FRED
;
You get this result
Thank you for the response! Attached is an example of the data, and I can't figure out the SAS code for the adjacency matrices. The node IDs are randomly assigned (unique) integers and are important for later merging the centrality scores with other data. Any help is greatly appreciated!
Not allowed to download XLSX files so if you want to share example data please just post a data step that makes the dataset.
But the real question is what are the statistics you want to calculate?
Are you sure it is easier to calculate from a "matrix" instead of the data you already have?
Ultimately, I want the centrality (eigenvector, degree, betweenness, closeness) scores for each node. I have done this in another software that takes only small data of ~100 nodes and it was easier to build the adj. matrix first and then calculate the centralities. Can that be skipped in SAS?
Node ID is a random integer (may as well be text) that's important for later merging with other data. Ex.:
data have; infile cards ;
input from $ to $ ;
cards;
1 3
1 5
3 5
3 6
15 18
21 37
28 53
;
run;
/*
I think you should post an example to show your input dataset and output dataset.
Tom and I understand it in two different way.
The following code is according to Tom's logistic.
*/
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have;
set _2005;
v=1;
output;
temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
drop temp;
run;
proc sql;
create table want as
select a.*,coalesce(b.v,0) as v
from (
select * from
(select node_ID as node_ID from have union select linked_node_ID from have),
(select node_ID as linked_node_ID from have union select linked_node_ID from have)
) as a natural left join (select distinct * from have) as b;
quit;
proc transpose data=want out=final_want(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;
This is a new question, should post it at IML forum.
I check your LOG.
56 57 A = final_want2005[1:2,4710:4711] ; 58 B = final_want2006[1:2,4710:4711] ; ERROR: (execution) Invalid subscript or subscript out of range. operation : [ at line 58 column 19 operands : final_want2006, _TEM1001, *LIT1007, *LIT1008 final_want2006 4694 rows 4695 cols (numeric) _TEM1001 1 row 2 cols (numeric)
final_want2006 only have 4695 columns unlike final_want2005 have 4711 columns, so you can't do this element-wise operator.
Here is my running code:
proc iml;
use final_want2005;
read all var _num_ into final_want2005;
close;
use final_want2006;
read all var _num_ into final_want2006;
close;
/*
x=ncol(final_want2005);
y=ncol(final_want2006);
print x y;
quit;
*/
A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ;
c = A # B ;
quit;
OK.You need borrow some missing NODES from 2005.
proc import datafile='c:\temp\book1.xlsx' out=_2005 dbms=xlsx replace;
sheet='2005';
run;
data have_2005;
set _2005;
v=1;
output;
temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
drop temp;
run;
proc sql;
create table want_2005 as
select a.*,coalesce(b.v,0) as v
from (
select * from
(select node_ID as node_ID from have_2005 union select linked_node_ID from have_2005),
(select node_ID as linked_node_ID from have_2005 union select linked_node_ID from have_2005)
) as a natural left join (select distinct * from have_2005) as b;
quit;
proc transpose data=want_2005 out=final_want_2005(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;
proc import datafile='c:\temp\book1.xlsx' out=_2006 dbms=xlsx replace;
sheet='2006';
run;
data have_2006;
set _2006;
v=1;
output;
temp=node_ID; node_ID=linked_node_ID;linked_node_ID=temp;output;
drop temp;
run;
proc sql;
create table want_2006 as
select a.*,coalesce(b.v,0) as v
from (
select * from
(select node_ID as node_ID from have_2005 union select linked_node_ID from have_2005), /*<-----*/
(select node_ID as linked_node_ID from have_2005 union select linked_node_ID from have_2005) /*<-----*/
) as a natural left join (select distinct * from have_2006) as b;
quit;
proc transpose data=want_2006 out=final_want_2006(drop=_name_);
by node_ID;
var v;
id linked_node_ID;
run;
proc iml;
use final_want_2005;
read all var _num_ into final_want2005;
close;
use final_want_2006;
read all var _num_ into final_want2006;
close;
A = final_want2005[1:2,4710:4711] ;
B = final_want2006[1:2,4710:4711] ;
c = A # B ;
quit;
thanks! I don't get it why it didn't work for me with this code and data...
proc iml ;
use sheet1 ;
read all var _num_ into layer1;
close;
use sheet2 ;
read all var _num_ into layer2;
close;
A = layer1[1:2,9:10] ;
B = layer2[1:2,9:10] ;
c = A # B ;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.