Solved
Contributor
Posts: 40

# Calculating pair-wise distances using the geodist function

Hi all,

data _null_;

d = geodist(39.757077, -75.547513, 39.749864, -75.554342, 'm');

put d;

run;

The code above calculates the distance between the two locations:  0.616476842 mile.

data have;

input lon lat;

cards;

-75.547513 39.757077

-75.554342 39.749864

-75.555394 39.730672

-75.556227 39.737546

;run;

Now, I have to calculate the distances between every possible pair using the "have" dataset. So, the number of distances is 4*3/2 = 6. (In fact, the "have" dataset has a lot of observations). Would you advise me on how to create a new dataset that has the 6 distance observations?

Thank you very much.

Accepted Solutions
Solution
‎11-29-2014 10:34 PM
Posts: 4,743

## Re: Calculating pair-wise distances using the geodist function

Below a SQL and a Data step approach. The SQL is less coding, the Data step is eventually faster when it comes to bigger volumes.

data have;

input lon lat;

recid=_n_;

cards;

-75.547513 39.757077

-75.554342 39.749864

-75.555394 39.730672

-75.556227 39.737546

;

run;

proc sql ;

create table want1 as

select

a.lon as a_lon,

a.lat as a_lat,

b.lon as b_lon,

b.lat as b_lat,

geodist(a.lat, a.lon, b.lat, b.lon, 'm') as geodist format=best32.

from have a, have b

where a.recid<b.recid

;

quit;

data want2(drop=_;

set have nobs=nobs;

if _n_=1 then

do;

if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));

dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');

_rc=h1.defineKey('recid');

_rc=h1.defineData('hlat','hlon');

_rc=h1.defineDone();

end;

do _i= (recid+1) to nobs;

_rc=h1.find(key:_i);

geodist=geodist(lat, lon, hlat, hlon, 'm');

output;

end;

run;

All Replies
Solution
‎11-29-2014 10:34 PM
Posts: 4,743

## Re: Calculating pair-wise distances using the geodist function

Below a SQL and a Data step approach. The SQL is less coding, the Data step is eventually faster when it comes to bigger volumes.

data have;

input lon lat;

recid=_n_;

cards;

-75.547513 39.757077

-75.554342 39.749864

-75.555394 39.730672

-75.556227 39.737546

;

run;

proc sql ;

create table want1 as

select

a.lon as a_lon,

a.lat as a_lat,

b.lon as b_lon,

b.lat as b_lat,

geodist(a.lat, a.lon, b.lat, b.lon, 'm') as geodist format=best32.

from have a, have b

where a.recid<b.recid

;

quit;

data want2(drop=_;

set have nobs=nobs;

if _n_=1 then

do;

if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));

dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');

_rc=h1.defineKey('recid');

_rc=h1.defineData('hlat','hlon');

_rc=h1.defineDone();

end;

do _i= (recid+1) to nobs;

_rc=h1.find(key:_i);

geodist=geodist(lat, lon, hlat, hlon, 'm');

output;

end;

run;

Contributor
Posts: 40

## Re: Calculating pair-wise distances using the geodist function

Thanks Patrick. I have the following error message:

9    data want2(drop=_;
10     set have nobs=nobs;
11     if _n_=1 then
12       do;
13         if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));
14         dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');
15         _rc=h1.defineKey('recid');
16         _rc=h1.defineData('hlat','hlon');
17         _rc=h1.defineDone();
18       end;
19     do _i= (recid+1) to nobs;
20       _rc=h1.find(key:_i);
21       geodist=geodist(lat, lon, hlat, hlon, 'm');
22       output;
23     end;
24   run;

NOTE: Variable recid is uninitialized.
ERROR: The variable recid in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: Hash data set load failed at line 17 column 11.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT2 may be incomplete.  When this step was stopped there were 0
observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time           0.09 seconds
cpu time            0.01 second

EDIT:

It worked after I added the "recid" variable:

data have;

input recid lon lat;

cards;

1 -75.547513 39.757077

2 -75.554342 39.749864

3 -75.555394 39.730672

4 -75.556227 39.737546

;run;

Thank you very much!

🔒 This topic is solved and locked.