Families are connected components. SAS has a proc to find those:
data visits;
input (g1-g3) ($);
datalines;
A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .
;
data links;
set visits;
array a g:;
do i = 2 to dim(a);
if not missing(a{i}) then do;
from = a{i-1};
to = a{i};
output;
end;
end;
keep from to;
run;
proc optnet data_links=links direction=undirected out_nodes=nodes;
concomp;
run;
proc sort data=nodes; by concomp node; run;
data families;
do until(last.concomp);
set nodes; by concomp;
length members $20;
members = catx(" ", members, node);
end;
rename concomp=family;
drop node;
run;
proc print data=families noobs; run;
I'm sure there's any number of ways to do it, but here's one that builds an array for each family and then use the IN operator to check which family the first diner is a member of. The "Family" variable is set based on the first diner. Only one diner is needed to identify the family.
Jim
Code:
DATA Families;
ARRAY FAMILY1 [4] $8 _temporary_ ('A' 'B' 'C' 'D');
ARRAY FAMILY2 [4] $8 _temporary_ ('E' 'F' 'G' 'H');
ARRAY FAMILY3 [4] $8 _temporary_ ('I' 'J' 'K' 'L');
INPUT Person1 $
Person2 $
Person3 $
;
IF Person1 IN FAMILY1 THEN
Family = 1;
ELSE
IF Person1 IN FAMILY2 THEN
Family = 2;
ELSE
IF Person1 IN FAMILY3 THEN
Family = 3;
DATALINES;
A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .
;
RUN;
Results:
Thank you Jim. This is great use of temporary arrays that I haven't used much yet.
In particular, I like the
IF Person IN Family
coding structure that an array allows. Notice that I don't have to subscript through the array or ever go down to the individual values in the array.
But it's hard to argue with @PGStats approach that not only can tell you which family dined but also define the families for you!
Jim
Families are connected components. SAS has a proc to find those:
data visits;
input (g1-g3) ($);
datalines;
A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .
;
data links;
set visits;
array a g:;
do i = 2 to dim(a);
if not missing(a{i}) then do;
from = a{i-1};
to = a{i};
output;
end;
end;
keep from to;
run;
proc optnet data_links=links direction=undirected out_nodes=nodes;
concomp;
run;
proc sort data=nodes; by concomp node; run;
data families;
do until(last.concomp);
set nodes; by concomp;
length members $20;
members = catx(" ", members, node);
end;
rename concomp=family;
drop node;
run;
proc print data=families noobs; run;
Awesome! This is exactly what I needed.
A way to reconstruct the families assuming I have no prior information about the families.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.