BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mk123451243
Fluorite | Level 6
Let’s say I own a restaurant in an area with only three families. each with four members. Let’s identify each family
by a group of four letters that represent each person with a family.
Family 1
A B C D
Family 2
E F G H
Family 3
I J K L

Each night, for eight nights, two or three people arrive at the restaurant. Guests are recorded in the order they enter the restaurant. All guests on any given night are from ONE family only. The resulting observation set for all eight nights is:

A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .

Now I ask, is there are a way to reconstruct the families by grouping letters into complete families given the eight observations?

There is a clear algorithm but it is unclear if it is possible in SAS. I have no reason to doubt that it is.

The algorithm involves setting the first observation as a family and comparing it to all other observations. If there are any shared people, then all people in those observations are aggregated to expand the original family. Then repeat for for each observation.
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Families are connected components. SAS has a proc to find those:

 

data visits;
input (g1-g3) ($);
datalines;
A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .
;

data links;
set visits;
array a g:;
do i = 2 to dim(a);
    if not missing(a{i}) then do;
        from = a{i-1};
        to = a{i};
        output;
        end;
    end;
keep from to;
run;

proc optnet data_links=links direction=undirected out_nodes=nodes;
concomp;
run;

proc sort data=nodes; by concomp node; run;

data families;
do until(last.concomp);
    set nodes; by concomp;
    length members $20;
    members = catx(" ", members, node);
    end;
rename concomp=family;
drop node;
run;

proc print data=families noobs; run;

image.png

PG

View solution in original post

5 REPLIES 5
jimbarbour
Meteorite | Level 14

I'm sure there's any number of ways to do it, but here's one that builds an array for each family and then use the IN operator to check which family the first diner is a member of.  The "Family" variable is set based on the first diner.  Only one diner is needed to identify the family.

 

Jim

 

Code:

DATA	Families;
	ARRAY	FAMILY1	[4] $8 _temporary_ ('A' 'B' 'C' 'D');
	ARRAY	FAMILY2	[4] $8 _temporary_ ('E' 'F' 'G' 'H');
	ARRAY	FAMILY3	[4] $8 _temporary_ ('I' 'J' 'K' 'L');

	INPUT	Person1 $
			Person2	$
			Person3	$
			;

	IF	Person1	IN	FAMILY1	THEN
		Family	=	1;
	ELSE
	IF	Person1	IN	FAMILY2	THEN
		Family	=	2;
	ELSE
	IF	Person1	IN	FAMILY3	THEN
		Family	=	3;

DATALINES;
A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .
;
RUN;

Results:

jimbarbour_0-1601524000621.png

 

mk123451243
Fluorite | Level 6

Thank you Jim. This is great use of temporary arrays that I haven't used much yet.

jimbarbour
Meteorite | Level 14

@mk123451243,

 

In particular, I like the 

IF Person IN Family

coding structure that an array allows.  Notice that I don't have to subscript through the array or ever go down to the individual values in the array.

 

But it's hard to argue with @PGStats approach that not only can tell you which family dined but also define the families for you!

 

Jim

PGStats
Opal | Level 21

Families are connected components. SAS has a proc to find those:

 

data visits;
input (g1-g3) ($);
datalines;
A B C
E G .
D B .
K L .
I K .
F G .
E G H
L J .
;

data links;
set visits;
array a g:;
do i = 2 to dim(a);
    if not missing(a{i}) then do;
        from = a{i-1};
        to = a{i};
        output;
        end;
    end;
keep from to;
run;

proc optnet data_links=links direction=undirected out_nodes=nodes;
concomp;
run;

proc sort data=nodes; by concomp node; run;

data families;
do until(last.concomp);
    set nodes; by concomp;
    length members $20;
    members = catx(" ", members, node);
    end;
rename concomp=family;
drop node;
run;

proc print data=families noobs; run;

image.png

PG
mk123451243
Fluorite | Level 6

Awesome! This is exactly what I needed. 

 

A way to reconstruct the families assuming I have no prior information about the families.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1014 views
  • 6 likes
  • 3 in conversation