data household;
input ID $ Group1_ID Group2_ID Group3_ID;
cards;
001 1 9 2
002 1 2 8
003 2 3 3
004 2 4 4
005 2 5 5
006 3 6 6
007 3 7 8
008 4 8 7
009 5 9 9
010 6 9 1
;
RUN;
ID | Group1 ID | Group2 ID | Group3 ID |
001 | 1 | 9 | 2 |
002 | 1 | 2 | 8 |
003 | 2 | 3 | 3 |
004 | 2 | 4 | 4 |
005 | 2 | 5 | 5 |
006 | 3 | 6 | 6 |
007 | 3 | 7 | 8 |
008 | 4 | 8 | 7 |
009 | 5 | 9 | 9 |
010 | 6 | 9 | 1 |
I have some individual IDs that belongs to different groups. I have 3 different group system.
For example, in Group1 ID system, ID 001 and ID 002 belong to the same group 1
in Group2 ID system, 001, 009 and 010 all belongs to the same group 9
in Group 3 ID system, 002 and 007 belong to the same group 8. Please note 007 and 006 also belong to the same group in Group1 ID system.
So all these related ID should linked to one group. I want to group them together, and generate a new group ID
ID | group1 id | group2 id | group3 id | new group id |
001 | 1 | 9 | 2 | 1 |
002 | 1 | 2 | 8 | 1 |
003 | 2 | 3 | 3 | 2 |
004 | 2 | 4 | 4 | 2 |
005 | 2 | 5 | 5 | 2 |
006 | 3 | 6 | 6 | 1 |
007 | 3 | 7 | 8 | 1 |
008 | 4 | 8 | 7 | 1 |
009 | 5 | 9 | 9 | 1 |
010 | 6 | 9 | 1 | 1 |
So why is ID=008 part of new group id=1 ?
you are right, 008's new group id should be 3
According to me and my algorithm below, ID=008 should be part of newGroup=3
data household;
input ID $ Group1_ID Group2_ID Group3_ID;
cards;
001 1 9 2
002 1 2 8
003 2 3 3
004 2 4 4
005 2 5 5
006 3 6 6
007 3 7 8
008 4 8 7
009 5 9 9
010 6 9 1
;
proc sql;
create table links as
select
a.ID as from,
b.ID as to
from
household as a inner join
household as b
on a.Group1_ID=b.Group1_ID or a.Group2_ID=b.Group2_ID or a.Group3_ID=b.Group3_ID and a.ID<b.ID;
quit;
proc optnet data_links=links direction=undirected out_nodes=groups;
concomp;
run;
proc sql;
/*create table want as*/
select
a.*,
b.concomp as newGroup_ID
from
household as a left join
groups as b on a.id=b.node;
quit;
works perfect.
I guess you want the code like this, in this way, a.ID<b.ID works everywhere.
on (a.Group1_ID=b.Group1_ID or a.Group2_ID=b.Group2_ID or a.Group3_ID=b.Group3_ID) and a.ID<b.ID;
However, after I made this change. 008 will miss the new group id, since no body links with it.
That's true. I think the best way to include unlinked nodes is:
proc sql;
create table links as
select
a.ID as from,
coalesce(b.ID, a.id) as to
from
household as a left join
household as b
on (a.Group1_ID=b.Group1_ID or a.Group2_ID=b.Group2_ID or a.Group3_ID=b.Group3_ID) and a.ID<b.ID;
quit;
proc optnet data_links=links direction=undirected include_selflink out_nodes=groups;
concomp;
run;
Note the use of option include_selflinks.
Alternatively, you can use an INNER join on a.id<=b.id but that will create a greater number of useless self links.
This works. My records has 36 million IDs, I noticed the first step has taken me 7 hours to run and still not finished.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.