Help using Base SAS procedures

Finding best match in dataset

Reply
N/A
Posts: 0

Finding best match in dataset

I'm trying to find the best match in a data set, but am having trouble. Would appreciate guidance.

I have something like the following:

data a;
input id v1;
datalines;
1 8
2 5
3 4
4 9
5 1
;
run;

For each observation, I'd like to find the closest match based on var1. For example, I need output like the following:


id v1 bestmatchid
1 8 4
2 5 3
3 4 2
4 9 1
5 1 3

Help! Message was edited by: ChefScottie
SAS Super FREQ
Posts: 8,868

Re: Finding best match in dataset

Posted in reply to deleted_user
What is the logic that constitutes the best match??? Is it just the closest value of v1???

cynthia
N/A
Posts: 0

Re: Finding best match in dataset

Posted in reply to Cynthia_sas
Yes, I want to find the observation with the lowest absolute value of the difference.

I'm hoping there is some way to do it with a match merge procedure, but have not been able to get that to work.
Trusted Advisor
Posts: 2,116

Re: Finding best match in dataset

Posted in reply to deleted_user
This can get nuanced (and nasty coding) fast. If your measure of nearness is just algebraic absolute different (Cynthia's question), then the next natural question is what about multiple nearest neighbors and how to pick amongst them. After that comes the problem of matching with replacement or not, and if not, how to maximize the number of matches and minimize the aggregate distance.

Just looking at the first question, using nearest neighbor, you can do that with a correlated subquery in PROC SQL. I'd recommend Scheier's book in the BBU series for a good coverage of that topic.

More generally, I'd look at the nearest neighbor algorithm on Google, as well as this search at SAS
nearest neighbor algorithm site:sas.com

Doc Muhlbaier
Duke
Super User
Posts: 10,046

Re: Finding best match in dataset

Posted in reply to deleted_user
There will always have a method to resolve your problem in SAS,hehe.
The following code can get what you need,But it just a way to resolve,because it will waste lots of your time When you have large data set.So optionally to use it.

[pre]
data a;
input id v1;
datalines;
1 8
2 5
3 4
4 9
5 1
;
run;
proc sql feedback;
select a_id ,a_v1 ,b_id
from(
select a.id as a_id,a.v1 as a_v1,b.id as b_id,abs(a.v1-b.v1) as distance,min( calculated distance) as min_dis
from a,a as b
where a.id ne b.id
group by a.id)
where distance eq min_dis;
quit;
[/pre]


Ksharp Message was edited by: Ksharp
N/A
Posts: 0

Re: Finding best match in dataset

Thanks for the advice. I will give it a try. I wish there was a SAS function to do the match. Seems like it would a pretty common problem.
Ask a Question
Discussion stats
  • 5 replies
  • 354 views
  • 0 likes
  • 4 in conversation