I have got two lists:
I need to generate IDs with matching properties. (currently I have no IDs, just the properties, need to create them and their connections)
For example:
LIST 1 :
CONNECTED_TO TOTAL_CONNECTIONS
2 10
1 5
1 5
LIST 2 :
CONNECTED_FROM TOTAL_CONNECTIONS
3 15
1 5
First obs in List 1 means we need to find 2 IDs in List 2 which are connected via 10 connections(in total) for an ID in List 1.
Similarly second obs in List 1 means that there is 1 ID connected via 5 connections for another ID in List 2.
And the obs in List 2 says, that it provides connections to 3 IDs in List 1.
So we can have 2 IDs in List 1 and 1 ID in List 2 generating 5 connections between ID1(from List 1)-ID2(from List 2) and ID1(from List 1)-ID3(from List 2) for obs 1 in List 1. And 5 obs each for ID2 in List 2, connecting it with ID1, ID4 and ID5. 5 Obs for ID3 in List 2 connecting it with ID1 in List 1.
Following should be clearer.
LIST 1 :
CONNECTED_TO TOTAL_CONNECTIONS
ID1 - 2 (ID2{5}, ID3{5}) 10
ID4 - 1 (ID2{5}) 5
ID5 - 1 (ID2{5}) 5
LIST 2 :
CONNECTED_FROM TOTAL_CONNECTIONS
ID2 - 3(ID1{5}, ID4{5}, ID5{5}) 15
ID3 - 1(ID1{5}) 5
So the output should look like something:
FROM TO
ID1 ID2
ID1 ID2
ID1 ID2
ID1 ID2
ID1 ID2
ID1 ID3
ID1 ID3
ID1 ID3
ID1 ID3
ID1 ID3
ID4 ID2
ID4 ID2
ID4 ID2
ID4 ID2
ID4 ID2
ID5 ID2
ID5 ID2
ID5 ID2
ID5 ID2
ID5 ID2
Can you please help me with this?
Not sure if you're still looking for a solution for this, but does this meet your needs to "construct a synthetic data" for a bipartite network?
%let list1nodes=5;
%let list2nodes=10;
%let maxconnections=5 ; *number of connections between any 2 nodes will be between 1 and &maxconnections;
%let pairsconnected=50 ; *average percentage of node pairs that are connected;
data want (keep=list1 list2);
array bipartite {&list1nodes,&list2nodes} _temporary_ ;
*define the number of connections between nodes in list 1 and node in list2 by populating the array;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if rand("Uniform")*100 <&pairsconnected then bipartite(i,j) =ceil( (&maxconnections)* rand("Uniform"));
end;
end;
*create the output with one row per connection;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if bipartite(i,j) gt 0 then do;
list1=cats("Id",i);
list2=cats("Id",j+dim(bipartite,1));
do k=1 to bipartite(i,j) ;
output;
end;
end;
end;
end;
run;
No one understands what you want it seems. Please rephrase.
I haven't got an answer for this, but trying to understand the question because it looks interesting, and hopefully will prompt some responses from the likes of @Ksharp et al.
After a little digging on the web, the keyword for this question is probably bipartite network. Looks like the nodes are in two sets , and connections always go from one set to the other, never between nodes in the same set, A node in one set can be connected to between zero and all of the nodes in the other set.
The question shows 5 nodes.with three nodes in one set and two nodes in the other set, and the question tells is how many other nodes each node is connected to. These can be expressed as a table, with a 1 representing a connection between two nodes. The nodes are all equivalent so the row order and column order doesn't matter.
List 2 Node | List 2 Node | Row Total | |
List 1 Node | 1 | 1 | 2 |
List 1 Node | 1 | 1 | |
List 1 Node | 1 | 1 | |
Column Total | 3 | 1 | 4 |
The three list one nodes are connected to 2,1,1 list two nodes respectively and these correspond to the row totals.
The two list one nodes are connected to 3 and 1 list one respectively and these correspond to the column totals.
The total number nodes list one is connected to = total number of nodes list 2 is connected to = 4.
The table shown above is the only way to represent this configuration (given row and col order don't matter). This can be extended to any number of nodes in list1 and list 2 , and any configuration of connections has a unique representation.
A fully connected 3 *2 network where every list one node is connected to every list two node would have 1's in each cell, 2's for each row total, 3's for each col total and 6 in total. I think any bipartite network can be represented by a matrix in this way
It may be that the first step to solving this would be to work out which cells have a 1 based on the row and column total that the data in the question gives us. by solving equations like.
eg
for the rows
( L1N1,L2N1) + (L1N1,L2N2) =2
( L1N2,L2N1) + (L1N2,L2N2) =1
( L1N3,L2N1) + (L1N3,L2N2) =1
for the cols
(L1N1,L2N1) + (L1N2,L2N1) +(L1N3,L2N1)=3
(L1N1,L2N2) + (L1N2,L2N2) +(L1N3,L2N2)=1
where (LxNy,LwNz) = 0 or 1
The question goes on to give the total number of connections that each node has, like a weighting. If we have already worked out which cells are populated
List 2 Node | List 2 Node | Row Total | |
List 1 Node | ? | ? | 10 |
List 1 Node | ? | 5 | |
List 1 Node | ? | 5 | |
Column Total | 15 | 5 | 20 |
This table ,if we work out numbers to replace the ?'s and give the nodes names (ID1-ID5), would be a representation of the OP's output with 20 observations.
So then maybe the problem comes down to solving
( L1N1,L2N1) + (L1N1,L2N2) =10
( L1N2,L2N1) + (L1N2,L2N2) =5
( L1N3,L2N1) + (L1N3,L2N2) =5
for the cols
(L1N1,L2N1) + (L1N2,L2N1) +(L1N3,L2N1)=15
(L1N1,L2N2) + (L1N2,L2N2) +(L1N3,L2N2)=5
where (LxNy,LwNz) may be 0 if there is no connection.
If you are just trying to construct a simulation of a bipartite network then do you have the option of starting with the completed 2-dimensional array to describe the network configuration?
eg
List 2 Node | List 2 Node | Row Total | |
List 1 Node | 5 | 5 | 10 |
List 1 Node | 5 | 5 | |
List 1 Node | 5 | 5 | |
Column Total | 15 | 5 | 20 |
If your answer to this is yes, then this could be extended to any size, and it will be fairly simple in SAS to define and populate the 2-d array with dimensions of N(set 1 ) * N(set 2) (without the row and column totals) then loop through it to output rows, using the number in each cell to output that number of rows. This way you end up with a table with one row for each connection as you described. I don't think the actual node names are important, just that they are distinguishable. Easy way would be just to have 2 numeric cols (set1 & set2) which increment as you progress through the loop.
However, if your starting point absolutely has to be in the form given in your original post (for each set and for each node just give the number connected to/from and the number of connections) then this is more challenging/interesting as you have to find a way to get to the unique 2-d array that describes the network. Once you have that 2-d array, then loop through it to output the table with one row per connection as described in the previous paragraph
Not sure if you're still looking for a solution for this, but does this meet your needs to "construct a synthetic data" for a bipartite network?
%let list1nodes=5;
%let list2nodes=10;
%let maxconnections=5 ; *number of connections between any 2 nodes will be between 1 and &maxconnections;
%let pairsconnected=50 ; *average percentage of node pairs that are connected;
data want (keep=list1 list2);
array bipartite {&list1nodes,&list2nodes} _temporary_ ;
*define the number of connections between nodes in list 1 and node in list2 by populating the array;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if rand("Uniform")*100 <&pairsconnected then bipartite(i,j) =ceil( (&maxconnections)* rand("Uniform"));
end;
end;
*create the output with one row per connection;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if bipartite(i,j) gt 0 then do;
list1=cats("Id",i);
list2=cats("Id",j+dim(bipartite,1));
do k=1 to bipartite(i,j) ;
output;
end;
end;
end;
end;
run;
Thanks very much, for the solution. I responded earlier than it seems but probably didn't go through. Apologies for that. 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.