BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have some objects with id's that are arranged as follows. 

 

Untitled.png

 

I have managed to convert the above graph to a dataset as follows:

 

data have;
input id nearby_id;
cards;
1001	1001
1001	1003
1001	1004
1003	1001
1003	1003
1003	1004
1004	1001
1004	1003
1004	1004
1004	1005
1005	1004
;
run;

What I wish to do is create a variable that groups all id's that are adjacent to each other into one group number:

 

data want;
input id nearby_id group;
cards;
1001	1001	1
1001	1003	1
1001	1004	1
1003	1001	1
1003	1003	1
1003	1004	1
1004	1001	2
1004	1003	2
1004	1004	2
1004	1005	2
1005	1004	3

;
run;

For instance, 1001 is adjacent to itself (i.e., 1001), 1003 and 1004, while 1003 is adjacent to 1001, itself (i.e., 1003), and 1004, Thus 1001 and 1003 both belong to the same "group". 

 

My idea is that from the original dataset "have", as long as the nearby_id is the same for two or more id's, then the group number is the same, e.g., for id = 1001, we have the pairs, (1001, 1001), (1001, 1003), and (1001, 1004) where the bold is the nearby_id. Similarly for id = 1003, we have the pairs (1003, 1001), (1003, 1003), and (10013, 1004), since the numbers in bold are the same, then the group numbers are the same for these ids. 

 

How can I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You want to group all ID that have the same list of NEARBY_ID values (where each ID automatically has a record with ID=NEARBY_ID).

 

I think the most modern way to perform your request in a data step would involve hash objects.  But this more traditional (in SAS) technique works reliably.  It transposes the data set so that each ID has one record with a set of variables (NRBY1 NRBY2 ….) representing the list of nearby_id values in ascending order.  That's data set NEED.

 

Then you can read NEED, incrementing the group number every time the pattern of NRBY values changes,  all the while re-transposing back:

 

data have;
input id nearby_id;
cards;
1001	1001
1001	1003
1001	1004
1003	1001
1003	1003
1003	1004
1004	1001
1004	1003
1004	1004
1004	1005
1005	1004
run;

proc sort data=have;
  by id nearby_id;
run;

proc transpose data=have out=need  prefix=nrby;
  by id;
  var nearby_id;
run;

proc sort data=need;
  by nrby:  id;
run;

data want (keep=id nearby_id group);
  group+1;
  do until (last.nrby4);
    set need ;
    by nrby: ;
    array nb {*} nrby: ;
    do _i=1 to dim(nb) while (nb{_i}^=.);
      nearby_id=nb{_i};
      output;
    end;
  end;
run;
    
  1. Make sure HAVE is sorted by id/nearby_id.  Then when two ID's have the same list of NRBY values, those values are in the same sequence.  This is required for the "SET NEED;  BY NRBY: ;" statement to work as desired in data WANT step.
  2. After the first transpose, data set need has to be sorted by NRBY1 NRBY2 NRBY3 … ID, so that matching patterns of NRBY values are contiguous.
  3. In the DATA WANT step, the "do until (last.nrby4);" statement means to keep looping until NRBY4  (or any NRBY variables preceding it) changes, i.e. until at least one of the set of neighbors changes.
  4. The inner loop does the re-transposing.

The only downside of this program is that you have to know how many NRBY variables will be created.  I.e. you have to know the maximum number of records for any single ID  (i.e. ID=4 has 4 records).  You could overcome this by adding variable NRBY999 to need, and then, instead of "last.nrby4", you could use "last.nrby999" no matter how many NRBY variables were generated by proc transpose (as long as it is less than 999): 

 

proc transpose data=have out=need  prefix=nrby;
  by id;
  var nearby_id;
run;

proc sort data=need;
  by nrby: id;
run;

** Add variable NRBY999 at the right hand end of the NRBY variables **;
data need2 / view=need2;
  set need;
  retain nrby999 .;
run;

data want (keep=id nearby_id group);
  group+1;
  do until (last.nrby999);
    set need2 ;
    by nrby: ;
    array nb {*} nrby: ;
    do _i=1 to dim(nb) while (nb{_i}^=.);
      nearby_id=nb{_i};
      output;
    end;
  end;
run;
    

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

You want to group all ID that have the same list of NEARBY_ID values (where each ID automatically has a record with ID=NEARBY_ID).

 

I think the most modern way to perform your request in a data step would involve hash objects.  But this more traditional (in SAS) technique works reliably.  It transposes the data set so that each ID has one record with a set of variables (NRBY1 NRBY2 ….) representing the list of nearby_id values in ascending order.  That's data set NEED.

 

Then you can read NEED, incrementing the group number every time the pattern of NRBY values changes,  all the while re-transposing back:

 

data have;
input id nearby_id;
cards;
1001	1001
1001	1003
1001	1004
1003	1001
1003	1003
1003	1004
1004	1001
1004	1003
1004	1004
1004	1005
1005	1004
run;

proc sort data=have;
  by id nearby_id;
run;

proc transpose data=have out=need  prefix=nrby;
  by id;
  var nearby_id;
run;

proc sort data=need;
  by nrby:  id;
run;

data want (keep=id nearby_id group);
  group+1;
  do until (last.nrby4);
    set need ;
    by nrby: ;
    array nb {*} nrby: ;
    do _i=1 to dim(nb) while (nb{_i}^=.);
      nearby_id=nb{_i};
      output;
    end;
  end;
run;
    
  1. Make sure HAVE is sorted by id/nearby_id.  Then when two ID's have the same list of NRBY values, those values are in the same sequence.  This is required for the "SET NEED;  BY NRBY: ;" statement to work as desired in data WANT step.
  2. After the first transpose, data set need has to be sorted by NRBY1 NRBY2 NRBY3 … ID, so that matching patterns of NRBY values are contiguous.
  3. In the DATA WANT step, the "do until (last.nrby4);" statement means to keep looping until NRBY4  (or any NRBY variables preceding it) changes, i.e. until at least one of the set of neighbors changes.
  4. The inner loop does the re-transposing.

The only downside of this program is that you have to know how many NRBY variables will be created.  I.e. you have to know the maximum number of records for any single ID  (i.e. ID=4 has 4 records).  You could overcome this by adding variable NRBY999 to need, and then, instead of "last.nrby4", you could use "last.nrby999" no matter how many NRBY variables were generated by proc transpose (as long as it is less than 999): 

 

proc transpose data=have out=need  prefix=nrby;
  by id;
  var nearby_id;
run;

proc sort data=need;
  by nrby: id;
run;

** Add variable NRBY999 at the right hand end of the NRBY variables **;
data need2 / view=need2;
  set need;
  retain nrby999 .;
run;

data want (keep=id nearby_id group);
  group+1;
  do until (last.nrby999);
    set need2 ;
    by nrby: ;
    array nb {*} nrby: ;
    do _i=1 to dim(nb) while (nb{_i}^=.);
      nearby_id=nb{_i};
      output;
    end;
  end;
run;
    

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
texasmfp
Lapis Lazuli | Level 10

mkeintz: this works very well when the values in the variable field nrby are numerical.  Here, I used ID in its place.

 

PROC SORT DATA=TEST3;
BY id;
RUN;

data want;
  group+1;
  do until (last.ID);
    set TEST3;
    by ID: ;
    array nb {*} ID: ;
    do _i=1 to dim(nb) while (nb{_i}^=.);
      nearby_id=nb{_i};
      output;
    end;
  end;
run;

However, I tried to use it for a character field (country) and it failed.

 

PROC SORT DATA=TEST3;
BY country;
RUN;

data want;
  group+1;
  do until (last.country);
    set TEST3;
    by country: ;
    array nb {*} country: ;
    do _i=1 to dim(nb) while (nb{_i}^=.);
      nearby_id=nb{_i};
      output;
    end;
  end;
run;

 

Can this be adapted?  Thanks

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1662 views
  • 3 likes
  • 3 in conversation