Hi all,
data _null_;
d = geodist(39.757077, -75.547513, 39.749864, -75.554342, 'm');
put d;
run;
The code above calculates the distance between the two locations: 0.616476842 mile.
data have;
input lon lat;
cards;
-75.547513 39.757077
-75.554342 39.749864
-75.555394 39.730672
-75.556227 39.737546
;run;
Now, I have to calculate the distances between every possible pair using the "have" dataset. So, the number of distances is 4*3/2 = 6. (In fact, the "have" dataset has a lot of observations). Would you advise me on how to create a new dataset that has the 6 distance observations?
Thank you very much.
Below a SQL and a Data step approach. The SQL is less coding, the Data step is eventually faster when it comes to bigger volumes.
data have;
input lon lat;
recid=_n_;
cards;
-75.547513 39.757077
-75.554342 39.749864
-75.555394 39.730672
-75.556227 39.737546
;
run;
proc sql ;
create table want1 as
select
a.lon as a_lon,
a.lat as a_lat,
b.lon as b_lon,
b.lat as b_lat,
geodist(a.lat, a.lon, b.lat, b.lon, 'm') as geodist format=best32.
from have a, have b
where a.recid<b.recid
;
quit;
data want2(drop=_:);
set have nobs=nobs;
if _n_=1 then
do;
if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));
dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');
_rc=h1.defineKey('recid');
_rc=h1.defineData('hlat','hlon');
_rc=h1.defineDone();
end;
do _i= (recid+1) to nobs;
_rc=h1.find(key:_i);
geodist=geodist(lat, lon, hlat, hlon, 'm');
output;
end;
run;
Below a SQL and a Data step approach. The SQL is less coding, the Data step is eventually faster when it comes to bigger volumes.
data have;
input lon lat;
recid=_n_;
cards;
-75.547513 39.757077
-75.554342 39.749864
-75.555394 39.730672
-75.556227 39.737546
;
run;
proc sql ;
create table want1 as
select
a.lon as a_lon,
a.lat as a_lat,
b.lon as b_lon,
b.lat as b_lat,
geodist(a.lat, a.lon, b.lat, b.lon, 'm') as geodist format=best32.
from have a, have b
where a.recid<b.recid
;
quit;
data want2(drop=_:);
set have nobs=nobs;
if _n_=1 then
do;
if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));
dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');
_rc=h1.defineKey('recid');
_rc=h1.defineData('hlat','hlon');
_rc=h1.defineDone();
end;
do _i= (recid+1) to nobs;
_rc=h1.find(key:_i);
geodist=geodist(lat, lon, hlat, hlon, 'm');
output;
end;
run;
Thanks Patrick. I have the following error message:
9 data want2(drop=_:);
10 set have nobs=nobs;
11 if _n_=1 then
12 do;
13 if 0 then set have have(keep=lat lon rename=(lat=hlat lon=hlon));
14 dcl hash h1(dataset:'have(keep=recid lat lon rename=(lat=hlat lon=hlon))');
15 _rc=h1.defineKey('recid');
16 _rc=h1.defineData('hlat','hlon');
17 _rc=h1.defineDone();
18 end;
19 do _i= (recid+1) to nobs;
20 _rc=h1.find(key:_i);
21 geodist=geodist(lat, lon, hlat, hlon, 'm');
22 output;
23 end;
24 run;
NOTE: Variable recid is uninitialized.
ERROR: The variable recid in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: Hash data set load failed at line 17 column 11.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT2 may be incomplete. When this step was stopped there were 0
observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.01 second
EDIT:
It worked after I added the "recid" variable:
data have;
input recid lon lat;
cards;
1 -75.547513 39.757077
2 -75.554342 39.749864
3 -75.555394 39.730672
4 -75.556227 39.737546
;run;
Thank you very much!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.