DATA Step, Macro, Functions and more

Calculating pair-wise distances using the geodist function

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Calculating pair-wise distances using the geodist function

Hi all,

data _null_;

d = geodist(39.757077, -75.547513, 39.749864, -75.554342, 'm');

put d;

run;

The code above calculates the distance between the two locations:  0.616476842 mile.

data have;

input lon lat;

cards;

-75.547513 39.757077

-75.554342 39.749864

-75.555394 39.730672

-75.556227 39.737546

;run;

Now, I have to calculate the distances between every possible pair using the "have" dataset. So, the number of distances is 4*3/2 = 6. (In fact, the "have" dataset has a lot of observations). Would you advise me on how to create a new dataset that has the 6 distance observations?

Thank you very much.


Accepted Solutions
Solution
‎11-29-2014 10:34 PM
Respected Advisor
Posts: 3,887

Re: Calculating pair-wise distances using the geodist function

Below a SQL and a Data step approach. The SQL is less coding, the Data step is eventually faster when it comes to bigger volumes.

data have;

  input lon lat;

  recid=_n_;

  cards;

-75.547513 39.757077

-75.554342 39.749864

-75.555394 39.730672

-75.556227 39.737546

;

run;

proc sql ;

  create table want1 as

    select

      a.lon as a_lon,

      a.lat as a_lat,

      b.lon as b_lon,

      b.lat as b_lat,

      geodist(a.lat, a.lon, b.lat, b.lon, 'm') as geodist format=best32.

    from have a, have b

    where a.recid<b.recid

  ;

quit;

data want2(drop=_Smiley Happy;

  set have nobs=nobs;

  if _n_=1 then

    do;

      if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));

      dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');

      _rc=h1.defineKey('recid');

      _rc=h1.defineData('hlat','hlon');

      _rc=h1.defineDone();

    end;

  do _i= (recid+1) to nobs; 

    _rc=h1.find(key:_i);

    geodist=geodist(lat, lon, hlat, hlon, 'm');

    output;

  end;

run;

View solution in original post


All Replies
Solution
‎11-29-2014 10:34 PM
Respected Advisor
Posts: 3,887

Re: Calculating pair-wise distances using the geodist function

Below a SQL and a Data step approach. The SQL is less coding, the Data step is eventually faster when it comes to bigger volumes.

data have;

  input lon lat;

  recid=_n_;

  cards;

-75.547513 39.757077

-75.554342 39.749864

-75.555394 39.730672

-75.556227 39.737546

;

run;

proc sql ;

  create table want1 as

    select

      a.lon as a_lon,

      a.lat as a_lat,

      b.lon as b_lon,

      b.lat as b_lat,

      geodist(a.lat, a.lon, b.lat, b.lon, 'm') as geodist format=best32.

    from have a, have b

    where a.recid<b.recid

  ;

quit;

data want2(drop=_Smiley Happy;

  set have nobs=nobs;

  if _n_=1 then

    do;

      if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));

      dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');

      _rc=h1.defineKey('recid');

      _rc=h1.defineData('hlat','hlon');

      _rc=h1.defineDone();

    end;

  do _i= (recid+1) to nobs; 

    _rc=h1.find(key:_i);

    geodist=geodist(lat, lon, hlat, hlon, 'm');

    output;

  end;

run;

Contributor
Posts: 40

Re: Calculating pair-wise distances using the geodist function

Thanks Patrick. I have the following error message:

9    data want2(drop=_Smiley Happy;
10     set have nobs=nobs;
11     if _n_=1 then
12       do;
13         if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));
14         dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');
15         _rc=h1.defineKey('recid');
16         _rc=h1.defineData('hlat','hlon');
17         _rc=h1.defineDone();
18       end;
19     do _i= (recid+1) to nobs;
20       _rc=h1.find(key:_i);
21       geodist=geodist(lat, lon, hlat, hlon, 'm');
22       output;
23     end;
24   run;

NOTE: Variable recid is uninitialized.
ERROR: The variable recid in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: Hash data set load failed at line 17 column 11.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT2 may be incomplete.  When this step was stopped there were 0
         observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.09 seconds
      cpu time            0.01 second


EDIT:

It worked after I added the "recid" variable:

data have;

input recid lon lat;

cards;

1 -75.547513 39.757077

2 -75.554342 39.749864

3 -75.555394 39.730672

4 -75.556227 39.737546

;run;

Thank you very much!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 812 views
  • 1 like
  • 2 in conversation