BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I have got two lists:

  1. List1 with variables: CONNECTED_TO TOTAL_CONNECTIONS
  2. List 2 with variables: CONNECTED_FROM TOTAL_CONNECTIONS

I need to generate IDs with matching properties. (currently I have no IDs, just the properties, need to create them and their connections)

For example:

LIST 1 :

CONNECTED_TO               TOTAL_CONNECTIONS

2                                              10

1                                              5

1                                              5

LIST 2 :

CONNECTED_FROM        TOTAL_CONNECTIONS

3                                              15

1                                               5

First obs in List 1 means we need to find 2 IDs in List 2 which are connected via 10 connections(in total) for an ID in List 1.

Similarly second obs in List 1 means that there is 1 ID connected via 5 connections for another ID in List 2.

And the obs in List 2 says, that it provides connections to 3 IDs in List 1.

So we can have 2 IDs in List 1 and 1 ID in List 2 generating 5 connections between ID1(from List 1)-ID2(from List 2) and ID1(from List 1)-ID3(from List 2) for obs 1 in List 1. And 5 obs each for ID2 in List 2, connecting it with ID1, ID4 and ID5. 5 Obs for ID3 in List 2 connecting it with ID1 in List 1.

Following should be clearer.

LIST 1 :

CONNECTED_TO                           TOTAL_CONNECTIONS

ID1 - 2  (ID2{5}, ID3{5})                      10

ID4 - 1  (ID2{5})                                   5

ID5 - 1  (ID2{5})                                   5

LIST 2 :

CONNECTED_FROM                       TOTAL_CONNECTIONS

ID2 - 3(ID1{5}, ID4{5}, ID5{5})             15

ID3 - 1(ID1{5})                                      5

 

 

So the output should look like something:

FROM   TO

ID1         ID2

ID1         ID2

ID1         ID2

ID1         ID2

ID1         ID2

ID1         ID3

ID1         ID3

ID1         ID3

ID1         ID3

ID1         ID3

ID4         ID2

ID4         ID2

ID4         ID2

ID4         ID2

ID4         ID2

ID5         ID2

ID5         ID2

ID5         ID2

ID5         ID2

ID5         ID2

 

 

Can you please help me with this?

1 ACCEPTED SOLUTION

Accepted Solutions
JohnHoughton
Quartz | Level 8

Not sure if you're still looking for a solution for this, but does this meet your needs to "construct a synthetic data" for a bipartite network?

 

%let list1nodes=5;
%let list2nodes=10;
%let maxconnections=5 ; *number of connections between any 2 nodes will be between 1 and &maxconnections;
%let pairsconnected=50 ; *average percentage of node pairs that are connected;
data want (keep=list1 list2);
array bipartite {&list1nodes,&list2nodes} _temporary_ ;
*define the number of connections between nodes in list 1 and node in list2 by populating the array;
do i=1 to dim(bipartite,1);
	do j=1 to dim(bipartite,2);
		if rand("Uniform")*100 <&pairsconnected then bipartite(i,j) =ceil( (&maxconnections)* rand("Uniform"));
	end;
end;
*create the output with one row per connection;
do i=1 to dim(bipartite,1);
	do j=1 to dim(bipartite,2);
		if bipartite(i,j) gt 0 then do;
			list1=cats("Id",i);
			list2=cats("Id",j+dim(bipartite,1));
			do k=1 to bipartite(i,j) ;
				output;
			end;
		end;
	end;
end;
run;

View solution in original post

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

No one understands what you want it seems. Please rephrase.

JohnHoughton
Quartz | Level 8

I haven't got an answer for this, but trying to understand the question because it looks interesting, and hopefully will prompt some responses from the likes of @Ksharp et al.

 

After a little digging on the web, the keyword for this question is probably bipartite network. Looks like the nodes are in two sets , and connections always go from one set to the other, never between nodes in the same set, A node in one set can be connected to between zero and all of the nodes in the other set.

 

The question shows 5 nodes.with  three nodes in one set and two nodes in the other set, and the question tells is how many other nodes each node is connected to. These can  be expressed as a table, with a 1 representing a connection between two nodes. The nodes are all equivalent so the row order and column order doesn't matter.

 

 List 2 NodeList 2 NodeRow Total
List 1 Node112
List 1 Node1 1
List 1 Node1 1
Column Total314

 

The three list one nodes are connected to 2,1,1 list two nodes respectively and these correspond to the row totals.

The two list one nodes are connected to 3 and 1 list one respectively and these correspond to the column totals.

The total number  nodes list one is connected to = total number of nodes list 2 is connected to = 4.

 

The table shown above is the only way to represent this configuration (given row and col order don't matter). This can be extended to any number of nodes in list1 and list 2 , and any configuration of connections has a unique representation.

 

A fully connected 3 *2 network where every list one node is connected to every list two node would have 1's in each cell, 2's for each row total, 3's for each col total and 6  in total. I think any bipartite network can be represented by a matrix in this way

 

It may be that the first step to solving this would be to work out which cells have a 1 based on the row and column total that the data in the question gives us. by solving equations like.

eg

for the rows

( L1N1,L2N1) + (L1N1,L2N2) =2 

( L1N2,L2N1) + (L1N2,L2N2) =1

( L1N3,L2N1) + (L1N3,L2N2) =1

for the cols 

(L1N1,L2N1) + (L1N2,L2N1) +(L1N3,L2N1)=3

(L1N1,L2N2) + (L1N2,L2N2) +(L1N3,L2N2)=1

 

 where (LxNy,LwNz) = 0 or 1

 

 

The question goes on to give the total number of connections that each node has, like a weighting. If we have already worked out which cells are populated 

 

 List 2 NodeList 2 NodeRow Total
List 1 Node??10
List 1 Node? 5
List 1 Node? 5
Column Total15520

 

This table ,if we work out numbers to replace the ?'s and give the nodes names (ID1-ID5), would be a representation of the OP's output with 20 observations.

 

So then maybe the problem comes down to solving 

 

( L1N1,L2N1) + (L1N1,L2N2) =10  

( L1N2,L2N1) + (L1N2,L2N2) =5

( L1N3,L2N1) + (L1N3,L2N2) =5

for the cols 

(L1N1,L2N1) + (L1N2,L2N1) +(L1N3,L2N1)=15

(L1N1,L2N2) + (L1N2,L2N2) +(L1N3,L2N2)=5

 where (LxNy,LwNz) may be 0 if there is no connection.

 

 

 

 

 

 

thepushkarsingh
Quartz | Level 8
Maybe the better way to explain my problem. I want to construct a synthetic data for GRAPH analysis with transactions, senders and recipients. I can simulate transactions value, the number of transactions and connections from a small dataset, but still I'll need to produce IDs and transactions. I also need to establish connections between those IDs, so I generated these two lists, then for same transactions per sender/recipient, I matched the rows in two lists by randomly picking from each other. This I repeated for 1 to n transactions per sender/beneficiary. I had to put a stop counter because the loop would have been gone to infinity or exhaustion of all the available choices. Then I replicated the sender-recipient pair transactions-per-sender/recipient times and filled with transactions. I needed to some easy way to do all this.
JohnHoughton
Quartz | Level 8

If you are just trying to construct a simulation of a bipartite network then do you have the option of starting with the completed 2-dimensional array to describe the network configuration? 

 

eg

 

 List 2 NodeList 2 NodeRow Total
List 1 Node5510
List 1 Node5 5
List 1 Node5 5
Column Total15520

 

If your answer to this is yes, then this could be extended to any size, and it will be fairly simple in SAS to define and populate the 2-d array with dimensions of N(set 1 ) * N(set 2)  (without the row and column totals) then loop through it to output rows, using the number in each cell to output that number of rows. This way you end up with a table with one row for each connection as you described. I don't think the actual node names are important, just that they are distinguishable. Easy way would be just to have 2 numeric cols (set1 & set2) which increment as you progress through the loop.

 

However, if your starting point absolutely has to be in the form given in your original post (for each set and for each node just give the number connected to/from and the number of connections) then this is more challenging/interesting as you have to find a way to get to the unique 2-d array that describes the network. Once you have that 2-d array,  then loop through it to output the table with one row per connection as described in the previous paragraph

JohnHoughton
Quartz | Level 8

Not sure if you're still looking for a solution for this, but does this meet your needs to "construct a synthetic data" for a bipartite network?

 

%let list1nodes=5;
%let list2nodes=10;
%let maxconnections=5 ; *number of connections between any 2 nodes will be between 1 and &maxconnections;
%let pairsconnected=50 ; *average percentage of node pairs that are connected;
data want (keep=list1 list2);
array bipartite {&list1nodes,&list2nodes} _temporary_ ;
*define the number of connections between nodes in list 1 and node in list2 by populating the array;
do i=1 to dim(bipartite,1);
	do j=1 to dim(bipartite,2);
		if rand("Uniform")*100 <&pairsconnected then bipartite(i,j) =ceil( (&maxconnections)* rand("Uniform"));
	end;
end;
*create the output with one row per connection;
do i=1 to dim(bipartite,1);
	do j=1 to dim(bipartite,2);
		if bipartite(i,j) gt 0 then do;
			list1=cats("Id",i);
			list2=cats("Id",j+dim(bipartite,1));
			do k=1 to bipartite(i,j) ;
				output;
			end;
		end;
	end;
end;
run;
thepushkarsingh
Quartz | Level 8

Thanks very much, for the solution. I responded earlier than it seems but probably didn't go through. Apologies for that. 🙂

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 969 views
  • 2 likes
  • 3 in conversation