I need some advice on how best to process this problem. I have a dataset of say 100,000 locations, with nonsequential IDs, a latitude and a longitude. How would I best go about finding, for each ID, the closest location to it. Don't worry about the actual calculation of distance, just wondering how best to process it. I don't have any formal SAS training and I 've never used arrays, so any tips would be appreciated. This is my current setup, it's certainly unwieldy, but I was rushed and didn't know how else to do it.
I created an observation number for sequencing called key and a macro variable obs to hold the total number of observations. Then I call a macro for each ID and its location. The macro uses retain statements to keep track of the closest distance so far, and loops through until the last observation, then outputs 1 row. So it's creating 100,000 1 row tables that I append together. Seems ridiculous. Thanks for your help.
%do i = 1 %to &obs;
where key = &i;
call symput ('ID', ID);
call symput ('lat', latitude);
call symput ('long', longitude);
%loop(&ID, &lat, &long)
%macro loop(ID_base, lat_base, long_base);
keep ID_base closest_ID;
retain closest_dist closest_ID;
In the documentation of proc sql .I found this code ,hope it will help you.
proc sql outobs=10;
title ’Neighboring Cities’;
select a.City format=$10., a.State,
a.Latitude ’Lat’, a.Longitude ’Long’,
b.City format=$10., b.State,
b.Latitude ’Lat’, b.Longitude ’Long’,
((b.longitude-a.longitude)**2)) as dist format=6.1
from sql.uscitycoords a, sql.uscitycoords b
where a.city ne b.city and
calculated dist =
(select min(sqrt(((d.latitude-c.latitude)**2) +
from sql.uscitycoords c, sql.uscitycoords d
where c.city = a.city and
c.state = a.state and
d.city ne c.city)
order by a.city;
And the best way to get answer is to post some your origin data and output you need.