<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701460#M214814</link>
    <description>&lt;P&gt;Here is a solution along the lines of the logic you outline yourself&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Read the entire data set into an ordered hash object &lt;STRONG&gt;h&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Create an additional hash object &lt;STRONG&gt;u&amp;nbsp;&lt;/STRONG&gt;to keep track of the ids already encountered.&lt;/LI&gt;
&lt;LI&gt;Traverse &lt;STRONG&gt;h&amp;nbsp;&lt;/STRONG&gt;one item at the time.&lt;/LI&gt;
&lt;LI&gt;If both IDs already exist in U, they have already been encountered. Therefore nothing happens. If not, we add them to U and output the observation to the desired data set.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;First, verify that the approach works as desired on your sample data set. Then test if it works on your actual data. 2600 Obs is no problem for the hash to handle. Let me know if it works for you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID_a $ ID_b $ var1 var2 $ var3 $;
cards;
A B 12 xxx yyy
A C 36 xxx yyy
C D 17 xxx yyy
D A 18 xxx yyy
D F 80 xxx yyy
G B 20 xxx yyy
G K 32 xxx yyy
;

data want(drop = id);
   dcl hash h(dataset : 'have', multidata : 'Y', ordered : 'A');
   h.definekey('var1', 'ID_a', 'ID_b');
   h.definedata(all : 'Y');
   h.definedone();
   dcl hiter i('h');

   dcl hash u();
   u.definekey('id');
   u.definedone();

   if 0 then set have;
   id = '        ';

   do while (i.next()=0);
      if u.check(key : ID_a) and u.check(key : ID_b) then do;
         u.ref(key : ID_a, data : ID_a);
         u.ref(key : ID_b, data : ID_b);
         output;
      end;
   end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;ID_a  ID_b  var1  var2  var3 
A     B     12    xxx   yyy 
C     D     17    xxx   yyy 
G     K     32    xxx   yyy &lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 25 Nov 2020 09:18:31 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2020-11-25T09:18:31Z</dc:date>
    <item>
      <title>Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701285#M214730</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hello everyone,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I&amp;nbsp;&lt;/SPAN&gt;have&lt;SPAN&gt;&amp;nbsp;a table with the calculated distance (var1) between customers (ID_a and ID_b). Var2 and var3 contain some other variables that are not relevant for the connection but I want to keep them. The data is sorted on ID_a and var1.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input (ID_a ID_b var1 var2 var3)($);
cards;
	A B 12 xxx yyy
	A C 36 xxx yyy
	C D 17 xxx yyy
	D A 18 xxx yyy
	D F 80 xxx yyy
	G B 20 xxx yyy
	G K 32 xxx yyy

run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I want to pair ID's with the smallest distance between them, and the ID may only be used once in the output. So the output should look like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
input (ID_a ID_b var1 var2 var3)($);
cards;
   A B 12 xxx yyy
   C D 17 xxx yyy
   G K 32 xxx yyy

run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The goal is to get a list where every ID is paired to one other ID, with the closest possible distance. Of course there are multiple combinations, because A can be the closest to B, while at the same time B can be more close to C. If possible it would be nice to get the combinations with the lowest total distance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The data set is ~2600 rows with 224 unique ID's. So the output should be 112 rows with a ID_a - ID_b combination.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What would be a solution to do this? I have been working on PROC SORT with NODUPKEY, but that is not going to work because each row consists of a unique combination. I am thinking about using a macro to select every first possibility of ID_a, while checking if the ID already has been used in ID_a or ID_b. If no, write the line to the output file and proceed to the next ID_a. If yes, then proceed to the second possibility/connection of ID_a and run the macro again. I can't find any example of such a routine, and I’m not sure this is the best solution. Maybe there are other ways to do this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you in advance for your suggestions!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Nov 2020 17:18:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701285#M214730</guid>
      <dc:creator>JelmerB</dc:creator>
      <dc:date>2020-11-24T17:18:42Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701313#M214742</link>
      <description>This seems like a linear optimization problem - perhaps the traveling salesmen problem?&lt;BR /&gt;Do you have a license for SAS/OR? It is included in On Demand for Academics - seems like you want the traveling salesman problem with a directed graph since you know which nodes you can travel along.&lt;BR /&gt;&lt;A href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.4&amp;amp;docsetId=ornoaug&amp;amp;docsetTarget=ornoaug_optnet_details42.htm&amp;amp;locale=en" target="_blank"&gt;https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.4&amp;amp;docsetId=ornoaug&amp;amp;docsetTarget=ornoaug_optnet_details42.htm&amp;amp;locale=en&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2020 18:16:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701313#M214742</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-11-24T18:16:20Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701314#M214743</link>
      <description>If you're not familiar with the traveling salesman problem here's the Wikipedia link:&lt;BR /&gt;&lt;A href="https://en.wikipedia.org/wiki/Travelling_salesman_problem" target="_blank"&gt;https://en.wikipedia.org/wiki/Travelling_salesman_problem&lt;/A&gt;</description>
      <pubDate>Tue, 24 Nov 2020 18:17:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701314#M214743</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-11-24T18:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701460#M214814</link>
      <description>&lt;P&gt;Here is a solution along the lines of the logic you outline yourself&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Read the entire data set into an ordered hash object &lt;STRONG&gt;h&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Create an additional hash object &lt;STRONG&gt;u&amp;nbsp;&lt;/STRONG&gt;to keep track of the ids already encountered.&lt;/LI&gt;
&lt;LI&gt;Traverse &lt;STRONG&gt;h&amp;nbsp;&lt;/STRONG&gt;one item at the time.&lt;/LI&gt;
&lt;LI&gt;If both IDs already exist in U, they have already been encountered. Therefore nothing happens. If not, we add them to U and output the observation to the desired data set.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;First, verify that the approach works as desired on your sample data set. Then test if it works on your actual data. 2600 Obs is no problem for the hash to handle. Let me know if it works for you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID_a $ ID_b $ var1 var2 $ var3 $;
cards;
A B 12 xxx yyy
A C 36 xxx yyy
C D 17 xxx yyy
D A 18 xxx yyy
D F 80 xxx yyy
G B 20 xxx yyy
G K 32 xxx yyy
;

data want(drop = id);
   dcl hash h(dataset : 'have', multidata : 'Y', ordered : 'A');
   h.definekey('var1', 'ID_a', 'ID_b');
   h.definedata(all : 'Y');
   h.definedone();
   dcl hiter i('h');

   dcl hash u();
   u.definekey('id');
   u.definedone();

   if 0 then set have;
   id = '        ';

   do while (i.next()=0);
      if u.check(key : ID_a) and u.check(key : ID_b) then do;
         u.ref(key : ID_a, data : ID_a);
         u.ref(key : ID_b, data : ID_b);
         output;
      end;
   end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;ID_a  ID_b  var1  var2  var3 
A     B     12    xxx   yyy 
C     D     17    xxx   yyy 
G     K     32    xxx   yyy &lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Nov 2020 09:18:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701460#M214814</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2020-11-25T09:18:31Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701467#M214817</link>
      <description>&lt;P&gt;Thanks for sharing this solution! The code runs smoothly on the sample data set. However, in my real date set the ID_a and ID_b are numeric. I think this is the reason I get the following error:&amp;nbsp;&lt;EM&gt;Variable ID_a has been defined as both character and numeric&lt;/EM&gt; (and the same error for ID_b). What do I need to change to work around this?&lt;/P&gt;</description>
      <pubDate>Wed, 25 Nov 2020 10:17:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701467#M214817</guid>
      <dc:creator>JelmerB</dc:creator>
      <dc:date>2020-11-25T10:17:46Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701471#M214818</link>
      <description>&lt;P&gt;A small correction in the code below (changed the sample data to numeric IDs)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let me know if it works &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID_a ID_b var1 var2 $ var3 $;
cards;
1 2  12 xxx yyy
1 3  36 xxx yyy
3 4  17 xxx yyy
4 1  18 xxx yyy
4 6  80 xxx yyy
7 2  20 xxx yyy
7 11 32 xxx yyy
;

data want(drop = id);
   dcl hash h(dataset : 'have', multidata : 'Y', ordered : 'A');
   h.definekey('var1', 'ID_a', 'ID_b');
   h.definedata(all : 'Y');
   h.definedone();
   dcl hiter i('h');

   dcl hash u();
   u.definekey('id');
   u.definedone();

   if 0 then set have;
   id = .;

   do while (i.next()=0);
      if u.check(key : ID_a) and u.check(key : ID_b) then do;
         u.ref(key : ID_a, data : ID_a);
         u.ref(key : ID_b, data : ID_b);
         output;
      end;
   end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;ID_a  ID_b  var1  var2  var3 
1     2     12    xxx   yyy 
3     4     17    xxx   yyy 
7     11    32    xxx   yyy 
&lt;/PRE&gt;</description>
      <pubDate>Wed, 25 Nov 2020 10:29:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701471#M214818</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2020-11-25T10:29:24Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701566#M214842</link>
      <description>This is working for me. Thanks for helping me out!</description>
      <pubDate>Wed, 25 Nov 2020 16:07:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701566#M214842</guid>
      <dc:creator>JelmerB</dc:creator>
      <dc:date>2020-11-25T16:07:40Z</dc:date>
    </item>
    <item>
      <title>Re: Pair two ID's based on the smallest distance between them and use every ID only 1 time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701568#M214843</link>
      <description>&lt;P&gt;Anytime, glad to help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Nov 2020 16:15:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pair-two-ID-s-based-on-the-smallest-distance-between-them-and/m-p/701568#M214843</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2020-11-25T16:15:26Z</dc:date>
    </item>
  </channel>
</rss>

