Hi all, I'm a beginner using SAS to perform network analysis. The data set is represented as the follows:
node 1 | node 2 |
---|---|
3 | 1 |
4 | 2 |
2 | 3 |
Each column represents the id of nodes; each row represents an edge from node 1 to node 2.There are a large amount of nodes, say 200000, Now I want to convert this data set to a 200000 x 200000 adjacency matrix, i.e. each row and each column represents a node, a value 1 is set to row i column j if there is an edge from node i to node j. According to the above table the converted one is like this:
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1 | 0 | 0 | 1 | 0 |
2 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 0 |
4 | 0 | 1 | 0 | 0 |
Now my problem is, because the data set is too large, when using SAS IML to create a 200000 x 200000 matrix there is insufficient memory. So I wonder if there is alternatives to create such a big matrix? If it is doable, how could I do it?
Many thanks!
SAS has a product (SAS Social Network Analysis) for SNA, but it doesn't use SAS/IML.
What you intend to do with the adjacency matrix once you have it? For a matrix that large, the obvious approch is to use a sparse representation, but there are only a small number of matrix operations that are supported for sparse matrices.
For smaller networks, you can use the SUB2NDX function (added in SAS/IML 12.1) to quickly build a (dense) adjacency matrix:
proc iml;
nodes = {
3 1,
4 2,
2 3 };
maxNode = max(nodes);
adj = j(maxNode, maxNode, 0);
idx = sub2ndx(dimension(adj), nodes);
adj[idx] = 1;
print adj;
SAS has a product (SAS Social Network Analysis) for SNA, but it doesn't use SAS/IML.
What you intend to do with the adjacency matrix once you have it? For a matrix that large, the obvious approch is to use a sparse representation, but there are only a small number of matrix operations that are supported for sparse matrices.
For smaller networks, you can use the SUB2NDX function (added in SAS/IML 12.1) to quickly build a (dense) adjacency matrix:
proc iml;
nodes = {
3 1,
4 2,
2 3 };
maxNode = max(nodes);
adj = j(maxNode, maxNode, 0);
idx = sub2ndx(dimension(adj), nodes);
adj[idx] = 1;
print adj;
Thanks for your reply.
I am planning to use the matrix to calculate descriptors such as betweenness, eigenvector, average path length... Because the network is very large and operations on an edge list usually take a very long time, so I was trying to convert it to an adjacency matrix.
About SAS social network analysis, is there a trial version available on SAS website? I looked at the link you mentioned but seems only an introduction about this product is there.
Right. Thought so. I don't think IML will be able to handle yourfull 200,000x200,000 adjacency matrix. For smaller networks, people have done what you are describing. There were some interesting papers on this at SAS Global Forum and other conferences in the 2012-2013. Do an internet search for
iml airport connectivity centrality
and you'll find some papers by Hector Rodriguez-Deniz that you might find interesting.
Ok, I see.
Thank you for the suggestions, though the examples you mentioned used smaller networks, they are very useful. Thank you.
Hi. What you want to use is PROC OPTGRAPH - this procedure can calculate all of these decsriptors, uses sparse representations, and scales very well. In order to use PROC OPTGRAPH, you nee da SAS Social Network Analysis (SNA) server license.
There is no trial version of SAS Social Network Analysis server that I am aware of.
To learn more about what PROC OPTGRAPH offers, you can consult the documentation here:
http://support.sas.com/documentation/solutions/optgraph/index.html
Hello, thanks for your reply.
Indeed the functions in PROC OPTGRAPH are what I'm looking for. But I guess SNA is only for business use? Because I'm a student so I wonder if it is possible for a student to obtain a server license?
Unfortunately, I do not think there are any student licenses for PROC OPTGRAPH.
Ow ok, thanks anyway.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.