BookmarkSubscribeRSS Feed
yliu1234
Obsidian | Level 7

data household;

        input ID $ Group1_ID Group2_ID Group3_ID;

        cards;

001 1 9 2

002 1 2 8

003 2 3 3

004 2 4 4

005 2 5 5

006 3 6 6

007 3 7 8

008 4 8 7

009 5 9 9

010 6 9 1

;

RUN;

ID

Group1 ID

Group2 ID

Group3 ID

001

1

9

2

002

1

2

8

003

2

3

3

004

2

4

4

005

2

5

5

006

3

6

6

007

3

7

8

008

4

8

7

009

5

9

9

010

6

9

1

 

I have some individual IDs that belongs to different groups. I have 3 different group system.

For example, in Group1 ID system, ID 001 and ID 002 belong to the same group 1 

in Group2 ID system, 001, 009 and 010 all belongs to the same group 9

in Group 3 ID system, 002 and 007 belong to the same group 8.  Please note 007 and 006 also belong to the same group in Group1 ID system.

 

So all these related ID should linked to one group. I want to group them together, and generate a new group ID

 

ID

group1 id

group2 id

group3 id

new group id

001

1

9

2

1

002

1

2

8

1

003

2

3

3

2

004

2

4

4

2

005

2

5

5

2

006

3

6

6

1

007

3

7

8

1

008

4

8

7

1

009

5

9

9

1

010

6

9

1

1

 

 

7 REPLIES 7
PGStats
Opal | Level 21

So why is ID=008 part of new group id=1 ?

PG
yliu1234
Obsidian | Level 7

you are right, 008's new group id should be 3

PGStats
Opal | Level 21

According to me and my algorithm below, ID=008 should be part of newGroup=3

 

data household;
        input ID $ Group1_ID Group2_ID Group3_ID;
        cards;
001 1 9 2
002 1 2 8
003 2 3 3
004 2 4 4
005 2 5 5
006 3 6 6
007 3 7 8
008 4 8 7
009 5 9 9
010 6 9 1
;

proc sql;
create table links as
select
    a.ID as from,
    b.ID as to
from 
    household as a inner join
    household as b 
        on a.Group1_ID=b.Group1_ID or a.Group2_ID=b.Group2_ID or a.Group3_ID=b.Group3_ID and a.ID<b.ID;
quit;

proc optnet data_links=links direction=undirected out_nodes=groups;
concomp;
run;

proc sql;
/*create table want as*/
select 
    a.*,
    b.concomp as newGroup_ID
from
    household as a left join 
    groups as b on a.id=b.node;
quit;

image.png

PG
yliu1234
Obsidian | Level 7
correct the ID 008 new group id should be 3
yliu1234
Obsidian | Level 7

works perfect.

I guess you want the code like this, in this way, a.ID<b.ID works everywhere. 

on (a.Group1_ID=b.Group1_ID or a.Group2_ID=b.Group2_ID or a.Group3_ID=b.Group3_ID) and a.ID<b.ID;

 

However, after I made this change.  008 will miss the new group id, since no body links with it.

PGStats
Opal | Level 21

That's true. I think the best way to include unlinked nodes is:

 

proc sql;
create table links as
select
    a.ID as from,
    coalesce(b.ID, a.id) as to
from 
    household as a left join
    household as b 
        on (a.Group1_ID=b.Group1_ID or a.Group2_ID=b.Group2_ID or a.Group3_ID=b.Group3_ID) and a.ID<b.ID;
quit;

proc optnet data_links=links direction=undirected include_selflink out_nodes=groups;
concomp;
run;

Note the use of option include_selflinks.

Alternatively, you can use an INNER join on a.id<=b.id but that will create a greater number of useless self links.

PG
yliu1234
Obsidian | Level 7

This works. My records has 36 million IDs, I noticed the first step has taken me 7 hours to run and still not finished.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 670 views
  • 0 likes
  • 2 in conversation