Hello everyone, I have a table with the calculated distance (var1) between customers (ID_a and ID_b). Var2 and var3 contain some other variables that are not relevant for the connection but I want to keep them. The data is sorted on ID_a and var1. data have;
input (ID_a ID_b var1 var2 var3)($);
cards;
A B 12 xxx yyy
A C 36 xxx yyy
C D 17 xxx yyy
D A 18 xxx yyy
D F 80 xxx yyy
G B 20 xxx yyy
G K 32 xxx yyy
run; Now I want to pair ID's with the smallest distance between them, and the ID may only be used once in the output. So the output should look like this: data want;
input (ID_a ID_b var1 var2 var3)($);
cards;
A B 12 xxx yyy
C D 17 xxx yyy
G K 32 xxx yyy
run; The goal is to get a list where every ID is paired to one other ID, with the closest possible distance. Of course there are multiple combinations, because A can be the closest to B, while at the same time B can be more close to C. If possible it would be nice to get the combinations with the lowest total distance. The data set is ~2600 rows with 224 unique ID's. So the output should be 112 rows with a ID_a - ID_b combination. What would be a solution to do this? I have been working on PROC SORT with NODUPKEY, but that is not going to work because each row consists of a unique combination. I am thinking about using a macro to select every first possibility of ID_a, while checking if the ID already has been used in ID_a or ID_b. If no, write the line to the output file and proceed to the next ID_a. If yes, then proceed to the second possibility/connection of ID_a and run the macro again. I can't find any example of such a routine, and I’m not sure this is the best solution. Maybe there are other ways to do this? Thank you in advance for your suggestions!
... View more