Calcite | Level 5

## Get a list of all relationships

Good morning everyone,

I describe my problem and the solution I need. I have a table with pairs that identify relationships between two people (for example, friends in a social network). My end result should be a table in which each record defines a group in which all people are related, directly or through other people.

For example, starting from the following image

My input table looks like this:

``````data have;
input member1 member2;
datalines;
1 2
1 4
2 3
2 4
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
;``````

And my output table should be the following:

Initially I don't know the maximum number of people in a relationship (but I don't think it will exceed 20 in this problem)

Has anyone faced a similar problem? Can you think of any way to solve it?

Thanks for your time, Merry Christmas and Happy 2021!!

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Get a list of all relationships

If you don't have SAS/OR .

``````

data have;
infile cards ;
input from \$  to \$ ;
cards;
1 2
1 4
2 3
2 4
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;

data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();

do while(hi_no.next()=0);
household+1; output;
count=1;
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
count+1;
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;``````
14 REPLIES 14
Super User

## Re: Get a list of all relationships

If you don't have SAS/OR .

``````

data have;
infile cards ;
input from \$  to \$ ;
cards;
1 2
1 4
2 3
2 4
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;

data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();

do while(hi_no.next()=0);
household+1; output;
count=1;
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
count+1;
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;``````
Calcite | Level 5

## Re: Get a list of all relationships

Goo morning everyone,

I have tested this code on my dataset (over 300,000 pairs), and it works perfectly. Thank you very much for your time!!! Also, it has helped me to discover the use of hashing in SAS.

Happy New Year to all!
Obsidian | Level 7

## Re: Get a list of all relationships

@Ksharp - Very helpful code!  This post and your solution are helping me on a challenge I currently have.  Would there be a way to score individuals in a household to say how many connections they have, or how "critical" they are to the household?  For example, in the original post, Person #4 has 4 connections, so a way to give them a score of 4 (or something similar).  I'm trying to explore ways to dive into the results I'm getting where some households have 500+ nodes.   I do not have SAS/OR, only access to BASE and SAS/STAT.

Super User

## Re: Get a list of all relationships

This post is long time ago . I strongly suggest you to start a brand-new session .

Anyway, Here is an example you are looking for.

``````
data have;
input _start \$ _end \$;
cards;
A      F
F      G
G      J
J      C
A      B
B      C
A      D
D      C
;
run;

proc sql;
create table ancient as
select _start,_end from have
where _start not in (select distinct _end from have);
quit;
data want(keep=path n_word);
if _n_ eq 1 then do;
length path _path  \$ 800 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;

set ancient;
count=1;n=1;_n=1;
path=catx('|',_start,_end);

do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find();
if rc ne 0 then do;
found=1;
n_word=countw(path,'|');
output;
end;

do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;

count+1;n=count;
path=catx('|',path,_end);
path=_path;
end;
rc=ha.find_next();
end;
end;

end;
pa.clear();
run;
``````
Super User

## Re: Get a list of all relationships

If you have SAS/OR .

``````
data have;
infile cards ;
input from \$  to \$ ;
cards;
1 2
1 4
2 3
2 4
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
;
run;

/* Same code as SAS/OR */
concomp;
run;``````
Barite | Level 11

## Re: Get a list of all relationships

You can use a Hash of Hash approach to conceptually construct a list of sets that is output tall wise as set_id, set_member_id which in turn is transposed into the desired wanted data layout.

Each member of a pair in the original data is either

- a new set must be created and added to the list of sets

- only one is found in the list of sets

- add the other member to the set containing the found member

- both are found but in different sets

- merge the smaller set into the larger set

Sample code with extra relationship data that requires a merge case

``````* new.add.merge in data simply there to indicate what the processing will do;
data have;
input member1 member2;
datalines;
1 2  new 1
2 4  both 1
5 8  both 1
6 7  both 1
7 8  both 1
9 10  new 2
10 12 both 2
11 12 both 2
13 14 new 3
16 17 new 4
15 16 merge 3 4
;

* separate data into distinct sets of vertices corresponding to linked nodes;
data want_tall;
call missing (id, member);

declare hash set set1 set2 setP setQ;
declare hiter set_iter;

if _n_ = 1 then do;
declare hash sets(ordered:'a');
sets.defineKey('id');
sets.defineData('id', 'set');
sets.defineDone();

declare hiter sets_iter('sets');
end;

set have end=done;

rc = sets_iter.first();
do while (rc=0);
found1 = ifn(set.check(key:member1) eq 0,id,0);

found2 = ifn(set.check(key:member2) eq 0,id,0);

rc = sets_iter.next(); * advance to next item, ensures set gets unlocked from iteration tracking;

if found1 and found2 then leave;
end;

*    put 'newing ' _n_=;

set = _new_ hash(ordered:'a');
set.defineKey('member');
set.defineData('member');
set.defineDone();

id = _n_;
end;
else if found1 ne found2 then do;
* case is either;
* - or each member is in a different set and the sets need to be merged;

if found1 = 0 or found2 = 0 then do;

id = found1 + found2;
sets.find();

if found1
end;
else do;
*      put 'merging ' _n_= found1= found2=;

sets.find(key:found1); set1 = set;
sets.find(key:found2); set2 = set;

if set1.num_items > set2.num_items then do;
setP = set1;    * merge 2 into 1;
setQ = set2;
keyQ = found2;
end;
else do;
setP = set2;    * merge 1 into 2;
setQ = set1;
keyQ = found1;
end;
* merge smaller set into the larger set and remove smaller set from memory;
set_iter = _new_ hiter('setQ');
do while (set_iter.next() = 0);
end;
rc = set_iter.delete();
sets.remove(key:keyQ);
setQ.delete();
end;
end;

if done then do;
declare hiter i1('sets');
do while(i1.next() eq 0);
declare hiter i2('set');
do while (i2.next() eq 0);
output;
end;
end;
end;

keep id member;
run;

proc transpose data=want_tall out=want(drop=_name_) prefix=member_;
by id;
var member;
format member: 4.;
run;``````

Output table after transpose

Meteorite | Level 14

## Re: Get a list of all relationships

Here is a data step solution without any hash tables, just indexed access:

``````proc sql;
create table roots as select distinct member1 from have
where member1 not in(select member2 from have);
create index member1 on have(member1);
quit;

data want;
array members(*) 8 member1-member20;
set roots;
do p=1 to dim(members) until(missing(members(p)));
m1=members(p);
do until(0);
set have(rename=(member1=m1 member2=m2)) key=m1;
if _iorc_ then do;
_error_=0;
leave;
end;
else if not whichn(m2,of members(*)) then do;
if n(of members(*))=dim(members) then do;
error 'Array dimension too small';
stop;
end;
else members(n(of members(*))+1)=m2;
end;
end;
end;
call sortn(of members(*));  /* missing come first with SORTN, correct that */
do _N_=1 to p-1;
members(_N_)=members(_N_+dim(members)-p+1);
end;
do _N_=p to dim(members);
call missing(members(_N_));
end;
drop p m1 m2;
run;``````
Barite | Level 11

## Re: Get a list of all relationships

Just a caution that the root detection will only work if the constraint rule member1 < member2 is in force.

For example, if a relationship link between 1 and 8 is entered as data (1, 😎 the query will still detect 1 as the root. However, if the data is (8, 1) the query will not detect a root for the blue network.

The robust root detection would ensure ordered (or constrained) linkage data

```proc sql;
create table roots as
select distinct member1 from
(select
case when member1 < member1 then member1 else member2 end as member1
from have
)
where member1 not in (
select
case when member1 < member1 then member2 else member1 end
from have
);```

could still be defeated by a self linked node.

The keyed the looks up would also have account for the non-constrained cases.

Meteorite | Level 14

## Re: Get a list of all relationships

You may be right. But the easiest solution to this seems to be sorting the relations (they were sorted already in the example data), e.g.:

``````data temp;
set have;
where member1 ne member2; /* get rid of self-linked codes */
call sortn(member1,member2);
run;``````

and then continue with the TEMP table as input instead of HAVE.

Super User

## Re: Get a list of all relationships

```data have;
infile cards ;
input member1 member2 ;
cards;
1 2
1 4
2 3
2 4
4 1
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
;
run;```
Meteorite | Level 14

## Re: Get a list of all relationships

That's exactly the kind of situation that my answer to @RichardDeVen will remedy.

So the total code is now:

``````data have;
infile cards ;
input member1 member2 ;
cards;
1 2
1 4
2 3
2 4
4 1
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
;
run;

data temp;
set have;
where member1 ne member2; /* get rid of self-linked codes */
call sortn(member1,member2);
run;

proc sql;
create table roots as select distinct member1 from temp
where member1 not in(select member2 from temp);
create index member1 on temp(member1);
quit;

data want;
array members(*) 8 member1-member20;
set roots;
do p=1 to dim(members) until(missing(members(p)));
m1=members(p);
do until(0);
set temp(rename=(member1=m1 member2=m2)) key=m1;
if _iorc_ then do;
_error_=0;
leave;
end;
else if not whichn(m2,of members(*)) then do;
if n(of members(*))=dim(members) then do;
error 'Array dimension too small';
stop;
end;
else members(n(of members(*))+1)=m2;
end;
end;
end;
call sortn(of members(*));  /* missing come first with SORTN, correct that */
do _N_=1 to p-1;
members(_N_)=members(_N_+dim(members)-p+1);
end;
do _N_=p to dim(members);
call missing(members(_N_));
end;
drop p m1 m2;
run;``````
Super User

## Re: Get a list of all relationships

```data have;
infile cards ;
input member1 member2 ;
cards;
1 2
1 4
2 3
2 4
8 1
3 8
4 5
4 6
5 7
5 8
6 7
7 8
9 10
9 11
9 12
10 12
11 12
14 .
;
run;```

What if this dataset ? Your code still get wrong result.

Meteorite | Level 14

## Re: Get a list of all relationships

When I run my code against your example, I get two lines with the "families", and 14 all alone in the world. Is that wrong?

But you have a point: If the data contains multiple "singles" (member1 or member2 missing), these will be grouped together because "missing" is their common parent. So the code may have to be adjusted for that, unless that is how it should be. If there are any such singles, that is. The example data did not show any of those.

But the code I have posted gets the right result with the original input data, and with your increasingly contrived exceptions.

Super User

## Re: Get a list of all relationships

If data like this ,Your code still get wrong result.
You can't expect data always look like what you want .

...........
10 12
11 12
14 14
Discussion stats
• 14 replies
• 6674 views
• 20 likes
• 5 in conversation