a week ago
I am struggling with structuring my data set in the correct form.
I have a dataset of persons and where they meet (see example "Dataset; People"). Instead of that structure I want a structure where colunm 1 is person1, column 2 is the meeting point and column 3 is the persons person 1 meets. So some people meet more people and different people.
So far I use the code following code:
Where the datasets people1 and people2 are exactly the same except the name of column 2:
|Dataset; People1||Dataset; People2|
|Meeting point||Person1||Meeting point||Person2|
|Adress 1||A||Adress 1||A|
|Adress 1||B||Adress 1||B|
|Adress 2||A||Adress 2||A|
|Adress 2||B||Adress 2||B|
|Adress 2||C||Adress 2||C|
|Adress 3||B||Adress 3||B|
|Adress 3||C||Adress 3||C|
To create one dataset I use the code:
create table dataset as
select a.meeting_point, a. person1, b.person2
from people1 as a
left join people2 as b
on a.meeting_point=b.meeting_point and a.person1 ne b.person2
where b.person2 ne "";
The problem with using that code is that the persons in column1 is distinct but not the pairs of people. I get the following:
The ones marked with red and dublet pairs and I only need distinct pairs.
Hope someone has a solution.
Your examples make no sense. There are no persons D and E in your initial dataset, and person C was never at address 1.
Then I suggest a small change to your SQL:
data people; input meeting_point $ person $; cards; ad1 a ad1 b ad2 a ad2 b ad2 c ad3 b ad3 c ; run; proc sql; create table dataset as select a.meeting_point, a.person as person1, b.person as person2 from people as a left join people as b on a.meeting_point = b.meeting_point and a.person < b.person where b.person ne ''; quit;