<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Finding best match in dataset in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47728#M12846</link>
    <description>I'm trying to find the best match in a data set, but am having trouble. Would appreciate guidance.&lt;BR /&gt;
&lt;BR /&gt;
I have something like the following:&lt;BR /&gt;
&lt;BR /&gt;
data a; &lt;BR /&gt;
  input id v1; &lt;BR /&gt;
datalines; &lt;BR /&gt;
1 8&lt;BR /&gt;
2 5&lt;BR /&gt;
3 4&lt;BR /&gt;
4 9&lt;BR /&gt;
5 1&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
For each observation, I'd like to find the closest match based on var1. For example, I need output like the following:&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
id   v1 bestmatchid&lt;BR /&gt;
1    8       4&lt;BR /&gt;
2    5       3&lt;BR /&gt;
3    4       2&lt;BR /&gt;
4    9       1&lt;BR /&gt;
5    1       3&lt;BR /&gt;
&lt;BR /&gt;
Help!

Message was edited by: ChefScottie</description>
    <pubDate>Mon, 12 Jul 2010 18:40:49 GMT</pubDate>
    <dc:creator>deleted_user</dc:creator>
    <dc:date>2010-07-12T18:40:49Z</dc:date>
    <item>
      <title>Finding best match in dataset</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47728#M12846</link>
      <description>I'm trying to find the best match in a data set, but am having trouble. Would appreciate guidance.&lt;BR /&gt;
&lt;BR /&gt;
I have something like the following:&lt;BR /&gt;
&lt;BR /&gt;
data a; &lt;BR /&gt;
  input id v1; &lt;BR /&gt;
datalines; &lt;BR /&gt;
1 8&lt;BR /&gt;
2 5&lt;BR /&gt;
3 4&lt;BR /&gt;
4 9&lt;BR /&gt;
5 1&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
For each observation, I'd like to find the closest match based on var1. For example, I need output like the following:&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
id   v1 bestmatchid&lt;BR /&gt;
1    8       4&lt;BR /&gt;
2    5       3&lt;BR /&gt;
3    4       2&lt;BR /&gt;
4    9       1&lt;BR /&gt;
5    1       3&lt;BR /&gt;
&lt;BR /&gt;
Help!

Message was edited by: ChefScottie</description>
      <pubDate>Mon, 12 Jul 2010 18:40:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47728#M12846</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-07-12T18:40:49Z</dc:date>
    </item>
    <item>
      <title>Re: Finding best match in dataset</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47729#M12847</link>
      <description>What is the logic that constitutes the best match??? Is it just the closest value of v1???&lt;BR /&gt;
&lt;BR /&gt;
cynthia</description>
      <pubDate>Mon, 12 Jul 2010 20:19:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47729#M12847</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2010-07-12T20:19:44Z</dc:date>
    </item>
    <item>
      <title>Re: Finding best match in dataset</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47730#M12848</link>
      <description>Yes, I want to find the observation with the lowest absolute value of the difference.&lt;BR /&gt;
&lt;BR /&gt;
I'm hoping there is some way to do it with a match merge procedure, but have not been able to get that to work.</description>
      <pubDate>Mon, 12 Jul 2010 20:46:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47730#M12848</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-07-12T20:46:37Z</dc:date>
    </item>
    <item>
      <title>Re: Finding best match in dataset</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47731#M12849</link>
      <description>This can get nuanced (and nasty coding) fast.  If your measure of nearness is just algebraic absolute different (Cynthia's question), then the next natural question is what about multiple nearest neighbors and how to pick amongst them.  After that comes the problem of matching with replacement or not, and if not, how to maximize the number of matches and minimize the aggregate distance.&lt;BR /&gt;
&lt;BR /&gt;
Just looking at the first question, using nearest neighbor, you can do that with a correlated subquery in PROC SQL.  I'd recommend Scheier's book in the BBU series for a good coverage of that topic.&lt;BR /&gt;
&lt;BR /&gt;
More generally, I'd look at the nearest neighbor algorithm on Google, as well as this search at SAS&lt;BR /&gt;
nearest neighbor algorithm site:sas.com&lt;BR /&gt;
&lt;BR /&gt;
Doc Muhlbaier&lt;BR /&gt;
Duke</description>
      <pubDate>Mon, 12 Jul 2010 20:50:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47731#M12849</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2010-07-12T20:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: Finding best match in dataset</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47732#M12850</link>
      <description>There will always have a method to resolve your problem in SAS,hehe.&lt;BR /&gt;
The following code can get what you need,But it just a way to resolve,because it will waste lots of your time When you have large data set.So optionally to use it.&lt;BR /&gt;
&lt;BR /&gt;
[pre]&lt;BR /&gt;
data a;&lt;BR /&gt;
input id v1;&lt;BR /&gt;
datalines;&lt;BR /&gt;
1 8&lt;BR /&gt;
2 5&lt;BR /&gt;
3 4&lt;BR /&gt;
4 9&lt;BR /&gt;
5 1&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
proc sql feedback;&lt;BR /&gt;
select a_id ,a_v1 ,b_id&lt;BR /&gt;
from(&lt;BR /&gt;
 select  a.id as a_id,a.v1 as a_v1,b.id as b_id,abs(a.v1-b.v1) as distance,min( calculated distance) as min_dis&lt;BR /&gt;
  from a,a as b&lt;BR /&gt;
   where a.id ne b.id &lt;BR /&gt;
    group by a.id)&lt;BR /&gt;
where distance eq min_dis;&lt;BR /&gt;
quit;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
Ksharp

Message was edited by: Ksharp</description>
      <pubDate>Tue, 13 Jul 2010 11:05:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47732#M12850</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2010-07-13T11:05:53Z</dc:date>
    </item>
    <item>
      <title>Re: Finding best match in dataset</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47733#M12851</link>
      <description>Thanks for the advice. I will give it a try. I wish there was a SAS function to do the match. Seems like it would a pretty common problem.</description>
      <pubDate>Tue, 13 Jul 2010 16:30:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Finding-best-match-in-dataset/m-p/47733#M12851</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-07-13T16:30:45Z</dc:date>
    </item>
  </channel>
</rss>

