Desktop productivity for business analysts and programmers

Social network data

Reply
New Contributor
Posts: 4

Social network data

Hi all.

 

I am struggling with structuring my data set in the correct form.

 

I have a dataset of persons and where they meet (see example "Dataset; People"). Instead of that structure I want a structure where colunm 1 is person1, column 2 is the meeting point and column 3 is the persons person 1 meets. So some people meet more people and different people.

 

Dataset; People
Meeting pointPerson
Adress 1A
Adress 1B
Adress 2A
Adress 2B
Adress 2C
Adress 3B
Adress 3C

 

So far I use the code following code:

 

data people1;

set people1;

run;

 

data people2;

set people2;

run;

 

Where the datasets people1 and people2 are exactly the same except the name of column 2:

 

Dataset; People1Dataset; People2
Meeting pointPerson1Meeting pointPerson2
Adress 1AAdress 1A
Adress 1BAdress 1B
Adress 2AAdress 2A
Adress 2BAdress 2B
Adress 2CAdress 2C
Adress 3BAdress 3B
Adress 3CAdress 3C

 

To create one dataset I use the code:

 

proc sql;

create table dataset as

select a.meeting_point, a. person1, b.person2

from people1 as a

left join people2 as b

on a.meeting_point=b.meeting_point and a.person1 ne b.person2

where b.person2 ne "";

quit;

 

The problem with using that code is that the persons in column1 is distinct but not the pairs of people. I get the following:

 

 
Meeting pointPerson1Person2
Adress 1AB
Adress 1AC
Adress 1AD
Adress 1AE
Adress 1BA
Adress 1BC
Adress 1BD
Adress 1BE
Adress 1CA
Adress 1CB
Adress 1CD
Adress 1CE
Adress 1DA
Adress 1DB
Adress 1DC
Adress 1DE
Adress 1EA
Adress 1EB
Adress 1EC
Adress 1ED

 

The ones marked with red and dublet pairs and I only need distinct pairs.

 

Hope someone has a solution.

 

Thank you!

Super User
Posts: 10,534

Re: Social network data

Your examples make no sense. There are no persons D and E in your initial dataset, and person C was never at address 1.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
New Contributor
Posts: 4

Re: Social network data

Posted in reply to KurtBremser
You are absolutely right. The dataset is just at small selection/example of the real data set which is extremely large. The important part is the structures and the hurdle is to delete dublicate pairs (i.e A-B is the same as B-A).
Super User
Posts: 10,534

Re: Social network data

Then I suggest a small change to your SQL:

data people;
input meeting_point $ person $;
cards;
ad1 a
ad1 b
ad2 a
ad2 b
ad2 c
ad3 b
ad3 c
;
run;

proc sql;
create table dataset as
select a.meeting_point, a.person as person1, b.person as person2
from people as a
left join people as b
on a.meeting_point = b.meeting_point and a.person < b.person
where b.person ne '';
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 3 replies
  • 93 views
  • 0 likes
  • 2 in conversation