Driving Distances and Drive Times using SAS and Google Maps

Started ‎07-05-2018 by
Modified ‎07-05-2018 by
Views 11,280

STRAIGHT LINE DISTANCE

Version 9.2 of SAS contains new functions that allow a user to compute geodesic distances. The ZIPCITYDISTANCE function uses information from the SASHELP.ZIPCODE data set and computes the distance between zip code centroids for any two zip codes specified by the user. The GEODIST function allows a user to specify two coordinates in terms of latitude and longitude and computes the distance between those coordinates. Both functions use the Vincenty distance formula. ZIPCITYDISTANCE produces distances in miles to one decimal place, while GEODIST will produce distances in either miles or kilometers and is not restricted to only one decimal place.

Prior to Version 9.2, a user had to use a data step and write an equation to compute such distances. The most common method was to use the Haversine formula. The Haversine formula distances are not as accurate as the new Vincenty-based computations. The following is an example that computes the distance between the centroids of two zip codes: 12203, my residence; 27513, SAS in Cary NC.

```data _null_;
* ZIPCITYDISTANCE function;
d1 = zipcitydistance(12203,27513);
* find lat/long of center of 12203;
zip = 12203;
set sashelp.zipcode (rename=(x=long1 y=lat1)) key=zip/unique;
* find lat/long of center of 20500;
zip = 20500;
set sashelp.zipcode (rename=(x=long2 y=lat2)) key=zip/unique;
* GEODIST function;
d2 = geodist(lat1, long1, lat2, long2, 'M');

*
convert latitude and longitude from degrees
to radians for use in Haversine formula
;
d_to_r = constant('pi')/180;
lat1  = lat1  * d_to_r;
long1 = long1 * d_to_r;
lat2  = lat2  * d_to_r;
long2 = long2 * d_to_r;

*
HAVERSINE formula (earth radius of 3949.99 miles
used to produce distance in miles)
;
d3 = 3949.99 * arcos(sin(lat1) * sin(lat2) + cos(lat1) *
cos(lat2) * cos(long2 - long1));

put
"ZIPCITYDISTANCE: " d1 /
"GEODIST        : " d2 /
"HAVERSINE      : " d3;

stop;
run;```

The SAS functions produced the same results, with more "accuracy" in the GEODIST result. The Haversine formula produced a value close to that of the SAS functions.

```ZIPCITYDISTANCE: 311.1
GEODIST        : 309.40076853
HAVERSINE      : 308.73423827

```

Before proceeding to the next section, please note the following. In the data set SASHELP.ZIPCODE, the variable ZIP is numeric and the ZIPCITYDISTANCE function that uses the that data set set requires either numeric constants or numeric variables as arguments. Also, in the above example, two observations were read from SASHELP.ZIPCODE using KEY= to find an observation that contains data for the specified zip code (a numeric variable). That type of access requires that the data set SASHELP.ZIPCODE be indexed. If it is not you can add an index to that data set using PROC DATASETS.

```proc sort data=sashelp.zipcode;
by zip;
run;

proc datasets lib=sashelp nolist;
modify zipcode;
index create zip;
quit;```

Finally, SASHELP.ZIPCODE is updated quarterly and the latest copy can be found at SAS Maps Online.

DRIVING DISTANCE AND DRIVE TIME

The three distance estimates computed in the previous section with SAS functions and the Haversine formula are all straight line distances. There are occasions where that type of estimate is what you desire (for example, how far away is my house from a pollution source). There are other occasions where what you want is not the straight line distance but a driving distance. Again, there are instances where the straight line and driving distances will be close, but that is not the usual case. If Google Maps is used to find the distance between zips 12203 and 20050, the value is 376 miles (with an estimated driving time of about 6 hours and 20 minutes).

Given only one combination of coordinates (or in this case, zip codes), entering the values in Google Maps to get the driving distance and drive time is no problem. If one has a large number of coordinates, manual entering of the values on the Google Maps web site might be very time consuming. In that situation, URL Access Method within SAS can be used to access the Google Maps web site multiple times and extract both the driving distance and drive time each time the site is accessed.

The following shows how to access the Google Maps web site once

```* enter two zip codes;
%let z1=12203;
%let z2=27513;

* no changes required below this line;
filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data _null_;
infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text \$30.;
units    = scan(text,1,'"');
time     = scan(text,3,'"');
file print;
put "DRIVING DISTANCE BETWEEN &z1 AND &z2 : "
distance comma6. +1 units" (TIME: " time ")";
stop;
done:
file print;
put "CANNOT FIND THE DISTANCE OR TIME BETWEEN &z1 AND &z2, TRY ANOTHER COMBINATION";
run;

filename x clear;
filename z clear;```

The first FILENAME statement shows the minimum amount of information Google Maps requires to generate a map showing driving directions between two zip codes. The two macro variables &z1 and &z2 are substituted in the web address. The next FILENAME statement creates a FILEREF Z that is used in the next data step (the destination TEMP puts anything written to FILEREF Z into SAS WORK space and the file is deleted when the WORK library is cleared at the end of the SAS session).

The two data steps used to read Google Maps output and then to find the driving distance and time are based on the papers by Rick Langston cited at the end of this posting. The first data step reads the entire contents of the web page that Google Maps produces when it "sees" the URL in the first FILENAME statement. The file is read one byte at a time and each byte is written to the temporary output file defined in the second FILENAME statement. After all the bytes are read, the total number of bytes read is stored in the macro variable &filesize.

The second data step reads the file created by the first data step and treats the entire string of bytes as one (long) record. The "@ string" feature of the INPUT statement is used to search for the word "miles". When that word is found, the line pointer is moved backward 15 spaces and the distance is read following the next occurrence of a double-quote character. The next 30 characters are read as the variable TEXT and the contents of TEXT are parsed using the SCAN function to find the units of distance (since Google Maps can produce either miles or kilometers) and the driving time (for now, as a character variable ... more on that later).

The above code produced the following in the output window of an interactive SAS session.

```DRIVING DISTANCE BETWEEN 12203 AND 27513 :    649 miles  (TIME: 9 h 56 min )
```

Now that it has been demonstrated how to compute one distance and time, let's move to the situation of multiple zip codes and multiple uses of Google Maps using SAS. The following is an example that shows how to find the driving distance and drive time from zip 12203 to five zips that all are related in some way to user MSZ03.

```* specify start point (zip1);
%let z1=12203;

* data set with zip codes (98765 is an invalid zip code);
data zip_info;
input zip @@;
datalines;
02138 13502 98765 20037 94117 12144
;

* place number of zip in a macro variable;
data _null_;
call symputx('nzips',obs);
stop;
set zip_info nobs=obs;
run;

* delete any data set named DISTANCE_TIME that might exist in the WORK library;
proc datasets lib=work nolist;
delete distance_time;
quit;

* create a macro that contains a loop to access Google Maps multiple time;
%macro distance_time;
%do j=1 %to &nzips;
data _null_;
nrec = &j;
set zip_info point=nrec;
call symputx('z2',put(zip,z5.));
stop;
run;

filename z temp;

* same technique used in the example with two zip codes;
data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

* drive time as a numeric variable;
data temp;
retain zip &z2;
infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text \$30.;
units    = scan(text,1,'"');
text     = scan(text,3,'"');

* convert times to seconds;
select;
* combine days and hours;
when (find(text,'d') ne 0)  time = sum(86400*input(scan(text,1,' '),best.),
3600*input(scan(text,3,' '),best.));
* combine hours and minutes;
when (find(text,'h') ne 0)  time = sum(3600*input(scan(text,1,' '),best.),
60*input(scan(text,3,' '),best.));
* just minutes;
otherwise                   time = 60*input(scan(text,1,' '),best.);
end;

output;
keep zip distance units time;
stop;
done:
output;
run;

filename x clear;
filename z clear;

* add an observation to the data set DISTANCE_TIME;
proc append base=distance_time data=temp;
run;
%end;
%mend;

* use the macro;
%distance_time;

* add the straight line distance to the data set;
data distance_time;
set distance_time;
zip_city = zipcitydistance(12203,zip);
run;

proc print data=distance_time;
format zip z5. distance comma6. time time6.;
run;```

In the above SAS code, the data step that reads the finds the driving distance and time from the Google Map web page is modified to produce the variable TIME as a numeric variable with a value in seconds. This would allow a SAS-based comparison of drive times between zips. The results of the SAS code are as follows.

```Obs      zip    distance    units      time    zip_city
1     02138        173     miles      2:46      139.4
2     13502         91     miles      1:26       76.5
3     98765          .                   .         .
4     20037        372     miles      6:00      311.2
5     94117      2,928     miles     42:00     2558.0
6     12144         12     miles      0:17        7.6
```

Notice the difference between the driving distances and the distances produced using ZIPCITYDISTANCE. The values of TIME are hours and minutes.

Google Map access is not limited to just zip codes. One could also find distances between locations defined by address.

```* my home;
%let addr1=59 Lenox Ave, Albany, NY 12203;
* someplace in North Carolina;
%let addr2=SAS Campus Drive, Cary, NC 27513;

data _null_;
run;

filename z 'z:\temp.txt';

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data _null_;
infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text \$30.;
units    = scan(text,1,'"');
time     = scan(text,3,'"');
file print;
put "DISTANCE & TIME BETWEEN" / "&addr1" / "&addr2" / distance units / time;
stop;
done:
file print;
run;

filename x clear;
filename z clear;```

The above SAS code is similar to that used with zip codes and it produced the following. It is important to structure the addresses as shown in the SAS code (statements that create the macro variables &addr1 and &addr2).

```DISTANCE & TIME BETWEEN
59 Lenox Ave, Albany, NY 12203
SAS Campus Drive, Cary, NC 27513
650 miles
9 h 57 min

```

If the SAS and Google Maps combination cannot find a distance, you should see the following (notice the zip code used on the second address).

```CANNOT FIND THE DISTANCE BETWEEN
59 Lenox Ave 12203
SAS Campus Drive 99999
```

The following is a suggestion for finding driving distances and times for a number of different pairs of addresses. The ZIPCITYDISTANCE function is also used to show a comparison between driving and straight line distances. Notice that the address in the DATALINES file look a bit different than in the previous example, no city and sate are specified. there is code in the data step to use the SAS-supplied data set SASHELP.ZIPCODE to add the city and state to all the addresses.

```* data set with IDs and ADDRESSES;
input
@01 id     \$5.
;
set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;
set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;
datalines;
12345    59 LENOX AVE, 12203            1 UNIVERSITY PL, 12144
98989    616 COSBY ROAD, 13502          59 LENOX AVE, 12203
99999    59 Lenox Ave, 12203            SAS Campus Drive, 27513
87878    370 Frederick St, 94117        2129 N St NW, 20037
;

* place number of addresses in a macro variable;
data _null_;
stop;
run;

* delete any data set named DISTANCE_TIME that might exist in the WORK library;
proc datasets lib=work nolist;
delete distance_time;
quit;

* use a loop within a macro to access Google Maps multiple time;
%macro distance_time;
data _null_;
nrec = &j;
call symputx('id',id);
stop;
stop;
run;

filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data temp;
id = "&id";

infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text \$30.;
units    = scan(text,1,'"');
text     = scan(text,3,'"');

* convert times to seconds;
select;
* combine days and hours;
when (find(text,'d') ne 0) time = sum(86400*input(scan(text,1,' '),best.),
3600*input(scan(text,3,' '),best.));
* combine hours and minutes;
when (find(text,'h') ne 0) time = sum(3600*input(scan(text,1,' '),best.),
60*input(scan(text,3,' '),best.));
* just minutes;
otherwise                  time = 60*input(scan(text,1,' '),best.);
end;
output;
label
id    = 'ID #'
distance = 'DISTANCE (MILES)'
time     = 'TIME (HR:MIN)'
;
stop;
done:
output;
run;

filename x clear;
filename z clear;

proc append base=distance_time data=temp;
run;
%end;
%mend;

* use the macro;
%distance_time;

*
add the distance between ZIP centroids
(should be "in same ballpark" as driving distance)
;
data distance_time;
set distance_time;
* ZIP should be LAST entry in the addresses;
label zip_city = 'ZIPCITYDISTANCE FUNCTION';
run;

title "DRIVING DISTANCE AND TIMES";
proc print data=distance_time label noobs;
format time time6.;
run;```

The above macro produced the following results (scroll to right to see ZIPCITYDISTANCE results).

```DRIVING DISTANCE AND TIMES
ZIPCIYDISTANCE
12345    59 LENOX AVE, ALBANY, NY 12203               1 UNIVERSITY PL, RENSSELAER, NY 12144        5.4        0:16             7.6
98989    616 COSBY ROAD, UTICA, NY 13502              59 LENOX AVE, ALBANY, NY 12203              91.0        1:25            76.5
99999    59 LENOX AVE, ALBANY, NY 12203               SAS CAMPUS DRIVE, CARY, NC 27513           650.0        9:57           544.6
87878    370 FREDERICK ST, SAN FRANCISCO, CA 94117    2129 N ST NW, WASHINGTON, DC 20037        2818.0       41:00          2441.8
```

Though the variable UNITS was kept in the data set DISTANCE_TIME, it is not shown in the PROC PRINT output given that all addresses are within the US and Google Maps produces all such distances in miles.

If you have locations expressed in terms of latitude and longitude, the following will calculate driving distance and time.

```%let ll1=%str(42.691560,-73.827840);
%let ll2=%str(35.805410,-78.797679);

* no changes required below this line;

filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data _null_;
infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text \$30.;
units    = scan(text,1,'"');
time     = scan(text,3,'"');
file print;
put "DRIVING DISTANCE BETWEEN &ll1 AND &ll2 : "
distance units" (TIME: " time ")";
stop;
done:
file print;
put "CANNOT FIND THE DRIVING DISTANCE BETWEEN &ll1 AND &ll2 : " /
"TRY ANOTHER PAIR OF COORDINATES";
stop;
run;

filename x clear;
filename z clear;```

The above code produced the following in the output window of an interactive SAS session (if you look at the first example that used two zip codes and look at the calculated driving distance, you will see that the latitude and longitude just could be Albany, NY and Cary, NC).

```DRIVING DISTANCE BETWEEN 42.691560,-73.827840 AND 35.805410,-78.797679 : 651 miles  (TIME: 10 h 2 min )
```

A macro version of the above that would find the distance from a specified location to a series of other locations with all locations in terms of latitude and longitude might look as follows. The example uses a random sample of locations from the data set SASHELP.ZIPCODE and finds distances and times to each sample observation from the centroid of zip 12203.

```* lat/long of first location;
%let ll1=%str(42.691560,-73.827840);

*
create a data set with locations specified in latitude and longitude
a random sample of 5 observations from SASHELP.ZIPCODE
use SEED=0 to get a new sample each time program is run
;

proc surveyselect data=sashelp.zipcode (keep=zip city statecode x y)
out=lat_long sampsize=5 seed=0;
run;

*
place number of zip in a macro variable
in this example you know it is 5
but you might not know in another use of the SAS code
;
data _null_;
call symputx('nlls',obs);
stop;
set lat_long nobs=obs;
run;

* create a macro that contains a loop to access Google Maps multiple time;
%macro distance_time;

* delete any data set named DISTANCE_TIME that might exist in the WORK library;
proc datasets lib=work nolist;
delete distance_time;
quit;

%do j=1 %to &nlls;
data _null_;
nrec = &j;
set lat_long point=nrec;
call symputx('ll2',catx(',',y,x));
stop;
run;

* lat/long of centroid of zip 12203 hard-coded as part of the URL;
filename z temp;

* same technique used in the example with a pair of lat/long coodinates;
data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

* drive time as a numeric variable;
data temp;
infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text \$30.;
units    = scan(text,1,'"');
text     = scan(text,3,'"');
* convert times to seconds;
select;
* combine days and hours;
when (find(text,'d') ne 0) time = sum(86400*input(scan(text,1,' '),best.),
3600*input(scan(text,3,' '),best.));
* combine hours and minutes;
when (find(text,'h') ne 0) time = sum(3600*input(scan(text,1,' '),best.),
60*input(scan(text,3,' '),best.));
* just minutes;
otherwise                  time = 60*input(scan(text,1,' '),best.);
end;
output;
keep distance time;
stop;
done:
output;
run;

filename x clear;
filename z clear;

* add an observation to the data set DISTANCE_TIME;
proc append base=distance_time data=temp;
run;
%end;
%mend;

* use the macro;
%distance_time;

*
add variables from original data set to new data set distance_time
use geodist function to calculate straight line distance
;
data distance_time;
set distance_time;
set lat_long point=_n_;
straight_line = round(geodist(&ll1,-73.827840,y,x,'DM'), 0.01);
run;

proc print data=distance_time noobs label;
var x y time distance straight_line zip city statecode;
format zip z5. time time6. ;
run;```

The variable labels shown in the output have been propagated to the data set from SASHELP.ZIPCODE. Since a SEED of 0 is used in PROC SURVEYSELECT, if you cut/paste/run the above code you will get a different output.

```DRIVING DISTANCE AND TIMES
Longitude        Latitude
(degrees) of    (degrees) of
the center      the center                                          The                           Two-letter
(centroid)      (centroid)                           straight_    5-digit                         abbrev. for
of ZIP Code.    of ZIP Code.      time    distance       line      ZIP Code    Name of city/org    state name.
-75.683828       37.922079       6:49       423        343.42      23426      Sanford                 VA
-79.290511       34.536371      11:00       740        635.25      28383      Rowland                 NC
-88.065729       31.232711      19:18      1287       1113.06      36553      McIntosh                AL
-89.823776       34.633091      18:09      1219       1025.82      38638      Independence            MS
-117.519639       34.124307      40:00      2766       2415.80      91739      Rancho Cucamonga        CA
-122.146785       44.731480      47:00      2885       2388.54      97342      Detroit                 OR
```

If you do not already know about the data set SASHELP.ZIPCODE, you can read about it in the paper ZIP Code 411: A Well-Kept SAS Secret and ZIP Code 411: ZIP Code 411: Decoding SASHELP.ZIPCODE and Other SAS® Maps Online Mysteries.

WALKING, BIKING, OR MASS TRANSIT

There is an additional argument you can supply in any of the FILENAME statments that produce the URLs used to access Google Maps. That argument can be used to switch from driving distance/time to either walking, biking, or mass transit. In the example that specified two addresses (in macro variables), these were made to find walking time ...

```NEW SECOND ADDRESS ...
* my home;
%let addr1=59 Lenox Ave, Albany, NY 12203;
* Albany Medical Center;
%let addr2=47 New Scotland Ave, Albany, NY 12208;

NEW ITEM ADDED TO THE URL FILENAME ...

NOTE: various codes are ...
Bicycling: /data=!4m2!4m1!3e1
Walking:   /data=!4m2!4m1!3e2
Transit:   /data=!4m2!4m1!3e3
```

These distances/times would probably be most helpful when using locations in an urban area. NOTE, the codes shown above were found at "DdDave's Mapping Stuff". The above changes resulted in this output.

```DISTANCE & TIME BETWEEN
59 Lenox Ave, Albany, NY 12203
47 New Scotland Ave, Albany, NY 12208
3.9 miles
35 min
```

There are potential problems with all of the above SAS code. Finding the desired values of distance and time within the web page produced by Google Maps is based on the current structure of those web pages. This page has been edited several times to take into account changes made in the web page structure that occurred since the various examples were first posted.

NOTE ... The SGF paper that accompanies the methods shown on this page ("Driving Distances and Times Using SAS® and Google Maps") contains SAS code based on an earlier web page structure and that code no longer works.

The URL used in the FILENAME statement is also the current way of doing things with Google Map. If either or both (i.e. web page content, URL construction) change, one would have to modify the SAS code.

For more about using URL access in SAS, you can read Rick Langston's papers: "Handling Large Stream Files with the @'string' Feature" and "Creating SAS® Data Sets from HTML Table Definitions"

NOTE ... There is a SAS problem note "Problem Note 34098: SAS® is unable to connect to the host when specifying the host name while using... related to using the URL access method. If the code posted here produces an error message about not being able to connect to Google Maps, you may have to apply a HOTFIX to your SAS installation.

If you have any questions about this posting, you are welcome to send me a note by clicking here ... email Mike

This work was funded in part by NIH grant HHSN267200700019C from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

ALTERNATIVE METHOD

Another paper offers an alternative to the above methods. It shows how to use SAS with a MapQuest API (Application Program Interface) to get driving times and distances. Take a look at Batch Production of Driving Distances and Times Using SAS® and Web Map APIs by Ash Roy and Yingbo Na from theCanadian Institute for Health Information (a paper presented at the 2012 SAS Global Forum).

This is fantastic! Thank you so much.

One minor correction (I believe)...

The line of code that reads

`straight_line = round(geodist(&ll1,-73.827840,y,x,'DM'), 0.01); `

should be

`straight_line = round(geodist(&ll1,y,x,'DM'), 0.01);`

since the longitude value of -73.827840 is part of the ll1 macro variable. Otherwise, there are too many arguments for the GEODIST function.

This is incredibly helpful.  Does the Google Maps database take into account traffic?  How do I get a distribution of travel times from point A to point B for multiple pairs of locations?

Version history
Last update:
‎07-05-2018 05:34 PM
Updated by:
Contributors
Article Labels
Article Tags