I have some objects with id's that are arranged as follows.
I have managed to convert the above graph to a dataset as follows:
data have;
input id nearby_id;
cards;
1001 1001
1001 1003
1001 1004
1003 1001
1003 1003
1003 1004
1004 1001
1004 1003
1004 1004
1004 1005
1005 1004
;
run;
What I wish to do is create a variable that groups all id's that are adjacent to each other into one group number:
data want;
input id nearby_id group;
cards;
1001 1001 1
1001 1003 1
1001 1004 1
1003 1001 1
1003 1003 1
1003 1004 1
1004 1001 2
1004 1003 2
1004 1004 2
1004 1005 2
1005 1004 3
;
run;
For instance, 1001 is adjacent to itself (i.e., 1001), 1003 and 1004, while 1003 is adjacent to 1001, itself (i.e., 1003), and 1004, Thus 1001 and 1003 both belong to the same "group".
My idea is that from the original dataset "have", as long as the nearby_id is the same for two or more id's, then the group number is the same, e.g., for id = 1001, we have the pairs, (1001, 1001), (1001, 1003), and (1001, 1004) where the bold is the nearby_id. Similarly for id = 1003, we have the pairs (1003, 1001), (1003, 1003), and (10013, 1004), since the numbers in bold are the same, then the group numbers are the same for these ids.
How can I do this?
You want to group all ID that have the same list of NEARBY_ID values (where each ID automatically has a record with ID=NEARBY_ID).
I think the most modern way to perform your request in a data step would involve hash objects. But this more traditional (in SAS) technique works reliably. It transposes the data set so that each ID has one record with a set of variables (NRBY1 NRBY2 ….) representing the list of nearby_id values in ascending order. That's data set NEED.
Then you can read NEED, incrementing the group number every time the pattern of NRBY values changes, all the while re-transposing back:
data have;
input id nearby_id;
cards;
1001 1001
1001 1003
1001 1004
1003 1001
1003 1003
1003 1004
1004 1001
1004 1003
1004 1004
1004 1005
1005 1004
run;
proc sort data=have;
by id nearby_id;
run;
proc transpose data=have out=need prefix=nrby;
by id;
var nearby_id;
run;
proc sort data=need;
by nrby: id;
run;
data want (keep=id nearby_id group);
group+1;
do until (last.nrby4);
set need ;
by nrby: ;
array nb {*} nrby: ;
do _i=1 to dim(nb) while (nb{_i}^=.);
nearby_id=nb{_i};
output;
end;
end;
run;
The only downside of this program is that you have to know how many NRBY variables will be created. I.e. you have to know the maximum number of records for any single ID (i.e. ID=4 has 4 records). You could overcome this by adding variable NRBY999 to need, and then, instead of "last.nrby4", you could use "last.nrby999" no matter how many NRBY variables were generated by proc transpose (as long as it is less than 999):
proc transpose data=have out=need prefix=nrby;
by id;
var nearby_id;
run;
proc sort data=need;
by nrby: id;
run;
** Add variable NRBY999 at the right hand end of the NRBY variables **;
data need2 / view=need2;
set need;
retain nrby999 .;
run;
data want (keep=id nearby_id group);
group+1;
do until (last.nrby999);
set need2 ;
by nrby: ;
array nb {*} nrby: ;
do _i=1 to dim(nb) while (nb{_i}^=.);
nearby_id=nb{_i};
output;
end;
end;
run;
You want to group all ID that have the same list of NEARBY_ID values (where each ID automatically has a record with ID=NEARBY_ID).
I think the most modern way to perform your request in a data step would involve hash objects. But this more traditional (in SAS) technique works reliably. It transposes the data set so that each ID has one record with a set of variables (NRBY1 NRBY2 ….) representing the list of nearby_id values in ascending order. That's data set NEED.
Then you can read NEED, incrementing the group number every time the pattern of NRBY values changes, all the while re-transposing back:
data have;
input id nearby_id;
cards;
1001 1001
1001 1003
1001 1004
1003 1001
1003 1003
1003 1004
1004 1001
1004 1003
1004 1004
1004 1005
1005 1004
run;
proc sort data=have;
by id nearby_id;
run;
proc transpose data=have out=need prefix=nrby;
by id;
var nearby_id;
run;
proc sort data=need;
by nrby: id;
run;
data want (keep=id nearby_id group);
group+1;
do until (last.nrby4);
set need ;
by nrby: ;
array nb {*} nrby: ;
do _i=1 to dim(nb) while (nb{_i}^=.);
nearby_id=nb{_i};
output;
end;
end;
run;
The only downside of this program is that you have to know how many NRBY variables will be created. I.e. you have to know the maximum number of records for any single ID (i.e. ID=4 has 4 records). You could overcome this by adding variable NRBY999 to need, and then, instead of "last.nrby4", you could use "last.nrby999" no matter how many NRBY variables were generated by proc transpose (as long as it is less than 999):
proc transpose data=have out=need prefix=nrby;
by id;
var nearby_id;
run;
proc sort data=need;
by nrby: id;
run;
** Add variable NRBY999 at the right hand end of the NRBY variables **;
data need2 / view=need2;
set need;
retain nrby999 .;
run;
data want (keep=id nearby_id group);
group+1;
do until (last.nrby999);
set need2 ;
by nrby: ;
array nb {*} nrby: ;
do _i=1 to dim(nb) while (nb{_i}^=.);
nearby_id=nb{_i};
output;
end;
end;
run;
mkeintz: this works very well when the values in the variable field nrby are numerical. Here, I used ID in its place.
PROC SORT DATA=TEST3;
BY id;
RUN;
data want;
group+1;
do until (last.ID);
set TEST3;
by ID: ;
array nb {*} ID: ;
do _i=1 to dim(nb) while (nb{_i}^=.);
nearby_id=nb{_i};
output;
end;
end;
run;
However, I tried to use it for a character field (country) and it failed.
PROC SORT DATA=TEST3;
BY country;
RUN;
data want;
group+1;
do until (last.country);
set TEST3;
by country: ;
array nb {*} country: ;
do _i=1 to dim(nb) while (nb{_i}^=.);
nearby_id=nb{_i};
output;
end;
end;
run;
Can this be adapted? Thanks
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.