Thank you in advance.
Table A:
ID X Y
1 -90 35
2 -90 37
3 -90 39
Table B:
Area X Y
A -90 33
B -91 36
C -90 38
How can I query Table B to select the area that is closest (smallest difference between both x and y) for each observation in Table A such that I end up with:
ID X Y Area
Table A is about 240k observations. There are about 18k observations(areas) in table B.
Edit: for the final table, I want the X/Y coords from table A.
@dwarden3 wrote:
Yes, these are geocoordinates (Long/Lat).
Then you could replace the %Distance in @gamotte's solution with Geodist (lat1,long1, lat2,long2). The latitude does need to come before the longitude in the parameter order but the pair position doesn't.
The default unit reported as the distance would be Kilometers. If you want the distance in miles then Geodist (lat1,long1, lat2,long2,'M')
Closest? How do you define that in this case? Euclidean distance? Sum of the differences? Sum of the absolute value of the differences? Or some other measure?
Can you give an example or three?
Euclidean distance would work. These are X/Y coordinates.
@dwarden3 wrote:
Euclidean distance would work. These are X/Y coordinates.
Are these geocoodinates? As in latitude and longitude?
I ask because there is a specific function, Geodist, to calculate geodetic distance or the length of a curve on the earths surface and simple Euclidean distance may not be precise enough if dealing with a large range of coordinates.
@dwarden3 wrote:
Yes, these are geocoordinates (Long/Lat).
Then you could replace the %Distance in @gamotte's solution with Geodist (lat1,long1, lat2,long2). The latitude does need to come before the longitude in the parameter order but the pair position doesn't.
The default unit reported as the distance would be Kilometers. If you want the distance in miles then Geodist (lat1,long1, lat2,long2,'M')
Hello,
Depending of your definition of distance, you can modify the macro below :
%macro distance(x1, y1, x2, y2);
sqrt((&x2-&x1)**2+(&y2-&y1)**2)
%mend;
proc sql noprint;
CREATE TABLE want AS
SELECT a.*, b.Area, %distance(a.X,a.Y, b.X, b.Y) AS DIST
FROM a, b
GROUP BY ID
HAVING %distance(a.X,a.Y, b.X, b.Y)=min(%distance(a.X,a.Y, b.X, b.Y))
;
quit;
Note that the number of observations will increase if there are several observation for the minimal distance.
Is there any other information that would allow at least grouping?
The meaning of X? If that is a longitude in degrees and the range of values is large enough then 179.9 and -179.9 are "closer" than 179.9 and 175 and we need to know that to appropriately calculate distance.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.