No need to apologise, my comment was tongue in cheek. I had to try a a couple more things, now on the full data. 1- 10 seconds saving when sorting by latitude and stopping the matching attempts as soon as values are too high. 2- Partitioning divides times further. Here, times are divided by 5 by creating 6 partitions; down to 10s. I tried proc DS2 as this process is very suitable for parallelisation, but proc DS2 seems to only process observations in parallel, not do loop partitions. So not suitable here, unless someone better at proc ds2 than me can show us how it's done. data TEST; do ID=1 to 50000; LATITUDE =180*ranuni(0)-90; LONGITUDE=360*ranuni(0)-180; output; end; run; %********** Data step & array : 60s ************** ; %let nobs=50000; data DIST5; array IDS [&nobs.] _temporary_; array LAT [&nobs.] _temporary_; array LON [&nobs.] _temporary_; do I=1 to &nobs.; set TEST; IDS =ID; LAT =LATITUDE; LON =LONGITUDE; end; do I=1 to &nobs.; do J=1 to &nobs.; if 0 < abs(LAT-LAT ) < .091 then if 0<geodist(LAT,LON,LAT ,LON )< 10 then N10+1; end; if N10 then do; ID1=IDS; output; N10=0; end; end; keep ID1 N10; stop; run ; %********** Data step & array, sorted : 50s ************** ; proc sort data=TEST; by LATITUDE; data DIST6; array IDS [&nobs.] _temporary_; array LAT [&nobs.] _temporary_; array LON [&nobs.] _temporary_; do I=1 to &nobs.; set TEST; IDS =ID; LAT =LATITUDE; LON =LONGITUDE; end; do I=1 to &nobs.; do J=1 to &nobs.; if LAT > LAT+ .091 then leave; %* stop: the distance will now keep growing; if 0 < abs(LAT-LAT ) < .091 then if 0<geodist(LAT,LON,LAT ,LON )< 10 then N10+1; end; if N10 then do; ID1=IDS; output; N10=0; end; end; keep ID1 N10; stop; run ; proc sort data=DIST6; by ID1; run; %********** Do 6 smaller cartesian products: 10s ************** ; proc sort data=TEST; by LATITUDE; data DIST7; array IDS [&nobs.] _temporary_; array LAT [&nobs.] _temporary_; array LON [&nobs.] _temporary_; array START [6] _temporary_; %* Make six 30° latitude partitions; array STOP [6] _temporary_; do I=1 to &nobs.; set TEST; IDS =ID; LAT =LATITUDE; LON =LONGITUDE; if 1 then START[1]=1; %* Save partition boundaries ; if LATITUDE<=-59.8 then STOP [1]=I; %* Partitions overlap by 0.2° ~22km; if LATITUDE<=-60 then START[2]=I; if LATITUDE<=-29.8 then STOP [2]=I; if LATITUDE<=-30 then START[3]=I; if LATITUDE<= 0.2 then STOP [3]=I; if LATITUDE<= 0 then START[4]=I; if LATITUDE<= 30.2 then STOP [4]=I; if LATITUDE<= 30 then START[5]=I; if LATITUDE<= 60.2 then STOP [5]=I; if LATITUDE<= 60 then START[6]=I; if 1 then STOP [6]=I; end; do PARTITION=1 to 6; do I=START[PARTITION] to STOP[PARTITION]; do J=START[PARTITION] to STOP[PARTITION]; if LAT > LAT+ .091 then leave; if 0 < abs(LAT-LAT ) < .091 then if 0<geodist(LAT,LON,LAT ,LON )< 10 then N10+1; end; if N10 then do; ID1=IDS; output; N10=0; end; end; end; keep ID1 N10; stop; run; proc sql; %* Some points at the partition boundaries ; create table DIST7A as %* will have incomplete matches. ; select ID1, max(N10) as N10 %* Keep full matches. ; from DIST7 group by ID1 order by ID1; quit;
... View more