BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I'm trying to find the best match in a data set, but am having trouble. Would appreciate guidance.

I have something like the following:

data a;
input id v1;
datalines;
1 8
2 5
3 4
4 9
5 1
;
run;

For each observation, I'd like to find the closest match based on var1. For example, I need output like the following:


id v1 bestmatchid
1 8 4
2 5 3
3 4 2
4 9 1
5 1 3

Help! Message was edited by: ChefScottie
5 REPLIES 5
Cynthia_sas
SAS Super FREQ
What is the logic that constitutes the best match??? Is it just the closest value of v1???

cynthia
deleted_user
Not applicable
Yes, I want to find the observation with the lowest absolute value of the difference.

I'm hoping there is some way to do it with a match merge procedure, but have not been able to get that to work.
Doc_Duke
Rhodochrosite | Level 12
This can get nuanced (and nasty coding) fast. If your measure of nearness is just algebraic absolute different (Cynthia's question), then the next natural question is what about multiple nearest neighbors and how to pick amongst them. After that comes the problem of matching with replacement or not, and if not, how to maximize the number of matches and minimize the aggregate distance.

Just looking at the first question, using nearest neighbor, you can do that with a correlated subquery in PROC SQL. I'd recommend Scheier's book in the BBU series for a good coverage of that topic.

More generally, I'd look at the nearest neighbor algorithm on Google, as well as this search at SAS
nearest neighbor algorithm site:sas.com

Doc Muhlbaier
Duke
Ksharp
Super User
There will always have a method to resolve your problem in SAS,hehe.
The following code can get what you need,But it just a way to resolve,because it will waste lots of your time When you have large data set.So optionally to use it.

[pre]
data a;
input id v1;
datalines;
1 8
2 5
3 4
4 9
5 1
;
run;
proc sql feedback;
select a_id ,a_v1 ,b_id
from(
select a.id as a_id,a.v1 as a_v1,b.id as b_id,abs(a.v1-b.v1) as distance,min( calculated distance) as min_dis
from a,a as b
where a.id ne b.id
group by a.id)
where distance eq min_dis;
quit;
[/pre]


Ksharp Message was edited by: Ksharp
deleted_user
Not applicable
Thanks for the advice. I will give it a try. I wish there was a SAS function to do the match. Seems like it would a pretty common problem.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1355 views
  • 0 likes
  • 4 in conversation