I need some advice on how best to process this problem. I have a dataset of say 100,000 locations, with nonsequential IDs, a latitude and a longitude. How would I best go about finding, for each ID, the closest location to it. Don't worry about the actual calculation of distance, just wondering how best to process it. I don't have any formal SAS training and I 've never used arrays, so any tips would be appreciated. This is my current setup, it's certainly unwieldy, but I was rushed and didn't know how else to do it.
I created an observation number for sequencing called key and a macro variable obs to hold the total number of observations. Then I call a macro for each ID and its location. The macro uses retain statements to keep track of the closest distance so far, and loops through until the last observation, then outputs 1 row. So it's creating 100,000 1 row tables that I append together. Seems ridiculous. Thanks for your help.
%macro loop2;
%do i = 1 %to &obs;
data _null_;
set locations;
where key = &i;
call symput ('ID', ID);
call symput ('lat', latitude);
call symput ('long', longitude);
run;
%loop(&ID, &lat, &long)
%end;
%mend;
%macro loop(ID_base, lat_base, long_base);
data dist;
set locations;
keep ID_base closest_ID;
retain closest_dist closest_ID;
ID_base = "&ID_base";
lat_base = &lat_base / 1000000;
long_base = &long_base / -1000000;
if ID ne ID_base then do;
currest_dist = ....;
end;
if current_dist < closest_dist or closest_dist = . then do;
closest_dist = current_dist;
closest_ID = ID;
end;
if key = &obs then output;
run;
proc append force data = dist base = dist_total;
run;
%mend;
%loop2;
... View more