BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I'm trying to find the best match in a data set, but am having trouble. Would appreciate guidance.

I have something like the following:

data a;
input id v1;
datalines;
1 8
2 5
3 4
4 9
5 1
;
run;

For each observation, I'd like to find the closest match based on var1. For example, I need output like the following:


id v1 bestmatchid
1 8 4
2 5 3
3 4 2
4 9 1
5 1 3

Help! Message was edited by: ChefScottie
5 REPLIES 5
Cynthia_sas
Diamond | Level 26
What is the logic that constitutes the best match??? Is it just the closest value of v1???

cynthia
deleted_user
Not applicable
Yes, I want to find the observation with the lowest absolute value of the difference.

I'm hoping there is some way to do it with a match merge procedure, but have not been able to get that to work.
Doc_Duke
Rhodochrosite | Level 12
This can get nuanced (and nasty coding) fast. If your measure of nearness is just algebraic absolute different (Cynthia's question), then the next natural question is what about multiple nearest neighbors and how to pick amongst them. After that comes the problem of matching with replacement or not, and if not, how to maximize the number of matches and minimize the aggregate distance.

Just looking at the first question, using nearest neighbor, you can do that with a correlated subquery in PROC SQL. I'd recommend Scheier's book in the BBU series for a good coverage of that topic.

More generally, I'd look at the nearest neighbor algorithm on Google, as well as this search at SAS
nearest neighbor algorithm site:sas.com

Doc Muhlbaier
Duke
Ksharp
Super User
There will always have a method to resolve your problem in SAS,hehe.
The following code can get what you need,But it just a way to resolve,because it will waste lots of your time When you have large data set.So optionally to use it.

[pre]
data a;
input id v1;
datalines;
1 8
2 5
3 4
4 9
5 1
;
run;
proc sql feedback;
select a_id ,a_v1 ,b_id
from(
select a.id as a_id,a.v1 as a_v1,b.id as b_id,abs(a.v1-b.v1) as distance,min( calculated distance) as min_dis
from a,a as b
where a.id ne b.id
group by a.id)
where distance eq min_dis;
quit;
[/pre]


Ksharp Message was edited by: Ksharp
deleted_user
Not applicable
Thanks for the advice. I will give it a try. I wish there was a SAS function to do the match. Seems like it would a pretty common problem.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2269 views
  • 0 likes
  • 4 in conversation