Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: Code for Interconnecting Two Datasets (Network Construct)

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-22-2018 04:30 AM
(966 views)

I have got two lists:

- List1 with variables: CONNECTED_TO TOTAL_CONNECTIONS
- List 2 with variables: CONNECTED_FROM TOTAL_CONNECTIONS

I need to generate IDs with matching properties. (currently I have no IDs, just the properties, need to create them and their connections)

For example:

LIST 1 :

CONNECTED_TO TOTAL_CONNECTIONS

2 10

1 5

1 5

LIST 2 :

CONNECTED_FROM TOTAL_CONNECTIONS

3 15

1 5

First obs in List 1 means we need to find 2 IDs in List 2 which are connected via 10 connections(in total) for an ID in List 1.

Similarly second obs in List 1 means that there is 1 ID connected via 5 connections for another ID in List 2.

And the obs in List 2 says, that it provides connections to 3 IDs in List 1.

So we can have 2 IDs in List 1 and 1 ID in List 2 generating 5 connections between ID1(from List 1)-ID2(from List 2) and ID1(from List 1)-ID3(from List 2) for obs 1 in List 1. And 5 obs each for ID2 in List 2, connecting it with ID1, ID4 and ID5. 5 Obs for ID3 in List 2 connecting it with ID1 in List 1.

Following should be clearer.

LIST 1 :

CONNECTED_TO TOTAL_CONNECTIONS

ID1 - 2 (ID2{5}, ID3{5}) 10

ID4 - 1 (ID2{5}) 5

ID5 - 1 (ID2{5}) 5

LIST 2 :

CONNECTED_FROM TOTAL_CONNECTIONS

ID2 - 3(ID1{5}, ID4{5}, ID5{5}) 15

ID3 - 1(ID1{5}) 5

So the output should look like something:

FROM TO

ID1 ID2

ID1 ID2

ID1 ID2

ID1 ID2

ID1 ID2

ID1 ID3

ID1 ID3

ID1 ID3

ID1 ID3

ID1 ID3

ID4 ID2

ID4 ID2

ID4 ID2

ID4 ID2

ID4 ID2

ID5 ID2

ID5 ID2

ID5 ID2

ID5 ID2

ID5 ID2

Can you please help me with this?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Not sure if you're still looking for a solution for this, but does this meet your needs to "construct a synthetic data" for a bipartite network?

```
%let list1nodes=5;
%let list2nodes=10;
%let maxconnections=5 ; *number of connections between any 2 nodes will be between 1 and &maxconnections;
%let pairsconnected=50 ; *average percentage of node pairs that are connected;
data want (keep=list1 list2);
array bipartite {&list1nodes,&list2nodes} _temporary_ ;
*define the number of connections between nodes in list 1 and node in list2 by populating the array;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if rand("Uniform")*100 <&pairsconnected then bipartite(i,j) =ceil( (&maxconnections)* rand("Uniform"));
end;
end;
*create the output with one row per connection;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if bipartite(i,j) gt 0 then do;
list1=cats("Id",i);
list2=cats("Id",j+dim(bipartite,1));
do k=1 to bipartite(i,j) ;
output;
end;
end;
end;
end;
run;
```

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

No one understands what you want it seems. Please rephrase.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I haven't got an answer for this, but trying to understand the question because it looks interesting, and hopefully will prompt some responses from the likes of @Ksharp et al.

After a little digging on the web, the keyword for this question is probably bipartite network. Looks like the nodes are in two sets , and connections always go from one set to the other, never between nodes in the same set, A node in one set can be connected to between zero and all of the nodes in the other set.

The question shows 5 nodes.with three nodes in one set and two nodes in the other set, and the question tells is how many other nodes each node is connected to. These can be expressed as a table, with a 1 representing a connection between two nodes. The nodes are all equivalent so the row order and column order doesn't matter.

List 2 Node | List 2 Node | Row Total | |

List 1 Node | 1 | 1 | 2 |

List 1 Node | 1 | 1 | |

List 1 Node | 1 | 1 | |

Column Total | 3 | 1 | 4 |

The three list one nodes are connected to 2,1,1 list two nodes respectively and these correspond to the row totals.

The two list one nodes are connected to 3 and 1 list one respectively and these correspond to the column totals.

The total number nodes list one is connected to = total number of nodes list 2 is connected to = 4.

The table shown above is the only way to represent this configuration (given row and col order don't matter). This can be extended to any number of nodes in list1 and list 2 , and any configuration of connections has a unique representation.

A fully connected 3 *2 network where every list one node is connected to every list two node would have 1's in each cell, 2's for each row total, 3's for each col total and 6 in total. I think any bipartite network can be represented by a matrix in this way

It may be that the first step to solving this would be to work out which cells have a 1 based on the row and column total that the data in the question gives us. by solving equations like.

eg

for the rows

( L1N1,L2N1) + (L1N1,L2N2) =2

( L1N2,L2N1) + (L1N2,L2N2) =1

( L1N3,L2N1) + (L1N3,L2N2) =1

for the cols

(L1N1,L2N1) + (L1N2,L2N1) +(L1N3,L2N1)=3

(L1N1,L2N2) + (L1N2,L2N2) +(L1N3,L2N2)=1

where (LxNy,LwNz) = 0 or 1

The question goes on to give the total number of connections that each node has, like a weighting. If we have already worked out which cells are populated

List 2 Node | List 2 Node | Row Total | |

List 1 Node | ? | ? | 10 |

List 1 Node | ? | 5 | |

List 1 Node | ? | 5 | |

Column Total | 15 | 5 | 20 |

This table ,if we work out numbers to replace the ?'s and give the nodes names (ID1-ID5), would be a representation of the OP's output with 20 observations.

So then maybe the problem comes down to solving

( L1N1,L2N1) + (L1N1,L2N2) =10

( L1N2,L2N1) + (L1N2,L2N2) =5

( L1N3,L2N1) + (L1N3,L2N2) =5

for the cols

(L1N1,L2N1) + (L1N2,L2N1) +(L1N3,L2N1)=15

(L1N1,L2N2) + (L1N2,L2N2) +(L1N3,L2N2)=5

where (LxNy,LwNz) may be 0 if there is no connection.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Maybe the better way to explain my problem. I want to construct a synthetic data for GRAPH analysis with transactions, senders and recipients. I can simulate transactions value, the number of transactions and connections from a small dataset, but still I'll need to produce IDs and transactions. I also need to establish connections between those IDs, so I generated these two lists, then for same transactions per sender/recipient, I matched the rows in two lists by randomly picking from each other. This I repeated for 1 to n transactions per sender/beneficiary. I had to put a stop counter because the loop would have been gone to infinity or exhaustion of all the available choices. Then I replicated the sender-recipient pair transactions-per-sender/recipient times and filled with transactions. I needed to some easy way to do all this.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you are just trying to construct a simulation of a bipartite network then do you have the option of starting with the completed 2-dimensional array to describe the network configuration?

eg

List 2 Node | List 2 Node | Row Total | |

List 1 Node | 5 | 5 | 10 |

List 1 Node | 5 | 5 | |

List 1 Node | 5 | 5 | |

Column Total | 15 | 5 | 20 |

If your answer to this is yes, then this could be extended to any size, and it will be fairly simple in SAS to define and populate the 2-d array with dimensions of N(set 1 ) * N(set 2) (without the row and column totals) then loop through it to output rows, using the number in each cell to output that number of rows. This way you end up with a table with one row for each connection as you described. I don't think the actual node names are important, just that they are distinguishable. Easy way would be just to have 2 numeric cols (set1 & set2) which increment as you progress through the loop.

However, if your starting point absolutely has to be in the form given in your original post (for each set and for each node just give the number connected to/from and the number of connections) then this is more challenging/interesting as you have to find a way to get to the unique 2-d array that describes the network. Once you have that 2-d array, then loop through it to output the table with one row per connection as described in the previous paragraph

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Not sure if you're still looking for a solution for this, but does this meet your needs to "construct a synthetic data" for a bipartite network?

```
%let list1nodes=5;
%let list2nodes=10;
%let maxconnections=5 ; *number of connections between any 2 nodes will be between 1 and &maxconnections;
%let pairsconnected=50 ; *average percentage of node pairs that are connected;
data want (keep=list1 list2);
array bipartite {&list1nodes,&list2nodes} _temporary_ ;
*define the number of connections between nodes in list 1 and node in list2 by populating the array;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if rand("Uniform")*100 <&pairsconnected then bipartite(i,j) =ceil( (&maxconnections)* rand("Uniform"));
end;
end;
*create the output with one row per connection;
do i=1 to dim(bipartite,1);
do j=1 to dim(bipartite,2);
if bipartite(i,j) gt 0 then do;
list1=cats("Id",i);
list2=cats("Id",j+dim(bipartite,1));
do k=1 to bipartite(i,j) ;
output;
end;
end;
end;
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.