Version 9.2 of SAS contains new functions that allow a user to compute geodesic distances. The ZIPCITYDISTANCE function uses information from the SASHELP.ZIPCODE data set and computes the distance between zip code centroids for any two zip codes specified by the user. The GEODIST function allows a user to specify two coordinates in terms of latitude and longitude and computes the distance between those coordinates. Both functions use the Vincenty distance formula. ZIPCITYDISTANCE produces distances in miles to one decimal place, while GEODIST will produce distances in either miles or kilometers and is not restricted to only one decimal place.
Prior to Version 9.2, a user had to use a data step and write an equation to compute such distances. The most common method was to use the Haversine formula. The Haversine formula distances are not as accurate as the new Vincenty-based computations. The following is an example that computes the distance between the centroids of two zip codes: 12203, my residence; 27513, SAS in Cary NC.
data _null_; * ZIPCITYDISTANCE function; d1 = zipcitydistance(12203,27513); * find lat/long of center of 12203; zip = 12203; set sashelp.zipcode (rename=(x=long1 y=lat1)) key=zip/unique; * find lat/long of center of 20500; zip = 20500; set sashelp.zipcode (rename=(x=long2 y=lat2)) key=zip/unique; * GEODIST function; d2 = geodist(lat1, long1, lat2, long2, 'M'); * convert latitude and longitude from degrees to radians for use in Haversine formula ; d_to_r = constant('pi')/180; lat1 = lat1 * d_to_r; long1 = long1 * d_to_r; lat2 = lat2 * d_to_r; long2 = long2 * d_to_r; * HAVERSINE formula (earth radius of 3949.99 miles used to produce distance in miles) ; d3 = 3949.99 * arcos(sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(long2 - long1)); put "ZIPCITYDISTANCE: " d1 / "GEODIST : " d2 / "HAVERSINE : " d3; stop; run;
The SAS functions produced the same results, with more "accuracy" in the GEODIST result. The Haversine formula produced a value close to that of the SAS functions.
ZIPCITYDISTANCE: 311.1 GEODIST : 309.40076853 HAVERSINE : 308.73423827
Before proceeding to the next section, please note the following. In the data set SASHELP.ZIPCODE, the variable ZIP is numeric and the ZIPCITYDISTANCE function that uses the that data set set requires either numeric constants or numeric variables as arguments. Also, in the above example, two observations were read from SASHELP.ZIPCODE using KEY= to find an observation that contains data for the specified zip code (a numeric variable). That type of access requires that the data set SASHELP.ZIPCODE be indexed. If it is not you can add an index to that data set using PROC DATASETS.
proc sort data=sashelp.zipcode; by zip; run; proc datasets lib=sashelp nolist; modify zipcode; index create zip; quit;
Finally, SASHELP.ZIPCODE is updated quarterly and the latest copy can be found at SAS Maps Online.
The three distance estimates computed in the previous section with SAS functions and the Haversine formula are all straight line distances. There are occasions where that type of estimate is what you desire (for example, how far away is my house from a pollution source). There are other occasions where what you want is not the straight line distance but a driving distance. Again, there are instances where the straight line and driving distances will be close, but that is not the usual case. If Google Maps is used to find the distance between zips 12203 and 20050, the value is 376 miles (with an estimated driving time of about 6 hours and 20 minutes).
Given only one combination of coordinates (or in this case, zip codes), entering the values in Google Maps to get the driving distance and drive time is no problem. If one has a large number of coordinates, manual entering of the values on the Google Maps web site might be very time consuming. In that situation, URL Access Method within SAS can be used to access the Google Maps web site multiple times and extract both the driving distance and drive time each time the site is accessed.
The following shows how to access the Google Maps web site once
* enter two zip codes; %let z1=12203; %let z2=27513; * no changes required below this line; filename x url "https://www.google.com/maps/dir/&z1/&z2/?force=lite"; filename z temp; data _null_; infile x recfm=f lrecl=1 end=eof; file z recfm=f lrecl=1; input @1 x $char1.; put @1 x $char1.; if eof; call symputx('filesize',_n_); run; data _null_; infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' distance :comma12. text $30.; units = scan(text,1,'"'); time = scan(text,3,'"'); file print; put "DRIVING DISTANCE BETWEEN &z1 AND &z2 : " distance comma6. +1 units" (TIME: " time ")"; stop; done: file print; put "CANNOT FIND THE DISTANCE OR TIME BETWEEN &z1 AND &z2, TRY ANOTHER COMBINATION"; run; filename x clear; filename z clear;
The first FILENAME statement shows the minimum amount of information Google Maps requires to generate a map showing driving directions between two zip codes. The two macro variables &z1 and &z2 are substituted in the web address. The next FILENAME statement creates a FILEREF Z that is used in the next data step (the destination TEMP puts anything written to FILEREF Z into SAS WORK space and the file is deleted when the WORK library is cleared at the end of the SAS session).
The two data steps used to read Google Maps output and then to find the driving distance and time are based on the papers by Rick Langston cited at the end of this posting. The first data step reads the entire contents of the web page that Google Maps produces when it "sees" the URL in the first FILENAME statement. The file is read one byte at a time and each byte is written to the temporary output file defined in the second FILENAME statement. After all the bytes are read, the total number of bytes read is stored in the macro variable &filesize.
The second data step reads the file created by the first data step and treats the entire string of bytes as one (long) record. The "@ string" feature of the INPUT statement is used to search for the word "miles". When that word is found, the line pointer is moved backward 15 spaces and the distance is read following the next occurrence of a double-quote character. The next 30 characters are read as the variable TEXT and the contents of TEXT are parsed using the SCAN function to find the units of distance (since Google Maps can produce either miles or kilometers) and the driving time (for now, as a character variable ... more on that later).
The above code produced the following in the output window of an interactive SAS session.
DRIVING DISTANCE BETWEEN 12203 AND 27513 : 649 miles (TIME: 9 h 56 min )
Now that it has been demonstrated how to compute one distance and time, let's move to the situation of multiple zip codes and multiple uses of Google Maps using SAS. The following is an example that shows how to find the driving distance and drive time from zip 12203 to five zips that all are related in some way to user MSZ03.
* specify start point (zip1); %let z1=12203; * data set with zip codes (98765 is an invalid zip code); data zip_info; input zip @@; datalines; 02138 13502 98765 20037 94117 12144 ; * place number of zip in a macro variable; data _null_; call symputx('nzips',obs); stop; set zip_info nobs=obs; run; * delete any data set named DISTANCE_TIME that might exist in the WORK library; proc datasets lib=work nolist; delete distance_time; quit; * create a macro that contains a loop to access Google Maps multiple time; %macro distance_time; %do j=1 %to &nzips; data _null_; nrec = &j; set zip_info point=nrec; call symputx('z2',put(zip,z5.)); stop; run; filename x url "https://www.google.com/maps/dir/&z1/&z2/?force=lite"; filename z temp; * same technique used in the example with two zip codes; data _null_; infile x recfm=f lrecl=1 end=eof; file z recfm=f lrecl=1; input @1 x $char1.; put @1 x $char1.; if eof; call symputx('filesize',_n_); run; * drive time as a numeric variable; data temp; retain zip &z2; infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' distance :comma12. text $30.; units = scan(text,1,'"'); text = scan(text,3,'"'); * convert times to seconds; select; * combine days and hours; when (find(text,'d') ne 0) time = sum(86400*input(scan(text,1,' '),best.), 3600*input(scan(text,3,' '),best.)); * combine hours and minutes; when (find(text,'h') ne 0) time = sum(3600*input(scan(text,1,' '),best.), 60*input(scan(text,3,' '),best.)); * just minutes; otherwise time = 60*input(scan(text,1,' '),best.); end; output; keep zip distance units time; stop; done: output; run; filename x clear; filename z clear; * add an observation to the data set DISTANCE_TIME; proc append base=distance_time data=temp; run; %end; %mend; * use the macro; %distance_time; * add the straight line distance to the data set; data distance_time; set distance_time; zip_city = zipcitydistance(12203,zip); run; proc print data=distance_time; format zip z5. distance comma6. time time6.; run;
In the above SAS code, the data step that reads the finds the driving distance and time from the Google Map web page is modified to produce the variable TIME as a numeric variable with a value in seconds. This would allow a SAS-based comparison of drive times between zips. The results of the SAS code are as follows.
Obs zip distance units time zip_city 1 02138 173 miles 2:46 139.4 2 13502 91 miles 1:26 76.5 3 98765 . . . 4 20037 372 miles 6:00 311.2 5 94117 2,928 miles 42:00 2558.0 6 12144 12 miles 0:17 7.6
Notice the difference between the driving distances and the distances produced using ZIPCITYDISTANCE. The values of TIME are hours and minutes.
Google Map access is not limited to just zip codes. One could also find distances between locations defined by address.
* my home; %let addr1=59 Lenox Ave, Albany, NY 12203; * someplace in North Carolina; %let addr2=SAS Campus Drive, Cary, NC 27513; data _null_; call symput('a1',translate("&addr1",'+',' ')); call symput('a2',translate("&addr2",'+',' ')); run; filename x url "https://www.google.com/maps/dir/&a1/&a2/?force=lite"; filename z 'z:\temp.txt'; data _null_; infile x recfm=f lrecl=1 end=eof; file z recfm=f lrecl=1; input @1 x $char1.; put @1 x $char1.; if eof; call symputx('filesize',_n_); run; data _null_; infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' distance :comma12. text $30.; units = scan(text,1,'"'); time = scan(text,3,'"'); file print; put "DISTANCE & TIME BETWEEN" / "&addr1" / "&addr2" / distance units / time; stop; done: file print; put "CANNOT FIND THE DISTANCE BETWEEN" / "&addr1" / "&addr2" / "TRY ANOTHER PAIR OF ADDRESSES"; run; filename x clear; filename z clear;
The above SAS code is similar to that used with zip codes and it produced the following. It is important to structure the addresses as shown in the SAS code (statements that create the macro variables &addr1 and &addr2).
DISTANCE & TIME BETWEEN 59 Lenox Ave, Albany, NY 12203 SAS Campus Drive, Cary, NC 27513 650 miles 9 h 57 min
If the SAS and Google Maps combination cannot find a distance, you should see the following (notice the zip code used on the second address).
CANNOT FIND THE DISTANCE BETWEEN 59 Lenox Ave 12203 SAS Campus Drive 99999 TRY ANOTHER PAIR OF ADDRESSES
The following is a suggestion for finding driving distances and times for a number of different pairs of addresses. The ZIPCITYDISTANCE function is also used to show a comparison between driving and straight line distances. Notice that the address in the DATALINES file look a bit different than in the previous example, no city and sate are specified. there is code in the data step to use the SAS-supplied data set SASHELP.ZIPCODE to add the city and state to all the addresses.
* data set with IDs and ADDRESSES; data addresses; length addr1 addr2 $50; input @01 id $5. @10 addr1 $30. @40 addr2 $30. ; zip = input(scan(addr1,-1),5.); set sashelp.zipcode (keep=zip poname statecode) key=zip / unique; addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip); zip = input(scan(addr2,-1),5.); set sashelp.zipcode (keep=zip poname statecode) key=zip / unique; addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip); keep id addr: ; datalines; 12345 59 LENOX AVE, 12203 1 UNIVERSITY PL, 12144 98989 616 COSBY ROAD, 13502 59 LENOX AVE, 12203 99999 59 Lenox Ave, 12203 SAS Campus Drive, 27513 87878 370 Frederick St, 94117 2129 N St NW, 20037 ; * place number of addresses in a macro variable; data _null_; call symputx('naddr',obs); stop; set addresses nobs=obs; run; * delete any data set named DISTANCE_TIME that might exist in the WORK library; proc datasets lib=work nolist; delete distance_time; quit; * use a loop within a macro to access Google Maps multiple time; %macro distance_time; %do j=1 %to &naddr; data _null_; nrec = &j; set addresses point=nrec; call symputx('id',id); call symputx('a1',translate(trim(addr1),'+',' ')); call symputx('a2',translate(trim(addr2),'+',' ')); stop; stop; run; filename x url "https://www.google.com/maps/dir/&a1/&a2/?force=lite"; filename z temp; data _null_; infile x recfm=f lrecl=1 end=eof; file z recfm=f lrecl=1; input @1 x $char1.; put @1 x $char1.; if eof; call symputx('filesize',_n_); run; data temp; length addr1 addr2 $50; id = "&id"; addr1 = upcase(translate("&a1",' ','+')); addr2 = upcase(translate("&a2",' ','+')); infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' distance :comma12. text $30.; units = scan(text,1,'"'); text = scan(text,3,'"'); * convert times to seconds; select; * combine days and hours; when (find(text,'d') ne 0) time = sum(86400*input(scan(text,1,' '),best.), 3600*input(scan(text,3,' '),best.)); * combine hours and minutes; when (find(text,'h') ne 0) time = sum(3600*input(scan(text,1,' '),best.), 60*input(scan(text,3,' '),best.)); * just minutes; otherwise time = 60*input(scan(text,1,' '),best.); end; output; keep id addr1 addr2 distance units time; label addr1 = 'ADDRESS #1' addr2 = 'ADDRESS #2' id = 'ID #' distance = 'DISTANCE (MILES)' time = 'TIME (HR:MIN)' ; stop; done: output; run; filename x clear; filename z clear; proc append base=distance_time data=temp; run; %end; %mend; * use the macro; %distance_time; * add the distance between ZIP centroids (should be "in same ballpark" as driving distance) ; data distance_time; set distance_time; * ZIP should be LAST entry in the addresses; zip_city = zipcitydistance(scan(addr1,-1),scan(addr2,-1)); label zip_city = 'ZIPCITYDISTANCE FUNCTION'; run; title "DRIVING DISTANCE AND TIMES"; proc print data=distance_time label noobs; var id addr1 addr2 distance time zip_city; format time time6.; run;
The above macro produced the following results (scroll to right to see ZIPCITYDISTANCE results).
DRIVING DISTANCE AND TIMES ZIPCIYDISTANCE ID # ADDRESS #1 ADDRESS #2 (MILES) (HR:MIN) FUNCTION 12345 59 LENOX AVE, ALBANY, NY 12203 1 UNIVERSITY PL, RENSSELAER, NY 12144 5.4 0:16 7.6 98989 616 COSBY ROAD, UTICA, NY 13502 59 LENOX AVE, ALBANY, NY 12203 91.0 1:25 76.5 99999 59 LENOX AVE, ALBANY, NY 12203 SAS CAMPUS DRIVE, CARY, NC 27513 650.0 9:57 544.6 87878 370 FREDERICK ST, SAN FRANCISCO, CA 94117 2129 N ST NW, WASHINGTON, DC 20037 2818.0 41:00 2441.8
Though the variable UNITS was kept in the data set DISTANCE_TIME, it is not shown in the PROC PRINT output given that all addresses are within the US and Google Maps produces all such distances in miles.
If you have locations expressed in terms of latitude and longitude, the following will calculate driving distance and time.
%let ll1=%str(42.691560,-73.827840); %let ll2=%str(35.805410,-78.797679); * no changes required below this line; filename x url "https://www.google.com/maps/dir/&ll1/&ll2/?force=lite"; filename z temp; data _null_; infile x recfm=f lrecl=1 end=eof; file z recfm=f lrecl=1; input @1 x $char1.; put @1 x $char1.; if eof; call symputx('filesize',_n_); run; data _null_; infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' distance :comma12. text $30.; units = scan(text,1,'"'); time = scan(text,3,'"'); file print; put "DRIVING DISTANCE BETWEEN &ll1 AND &ll2 : " distance units" (TIME: " time ")"; stop; done: file print; put "CANNOT FIND THE DRIVING DISTANCE BETWEEN &ll1 AND &ll2 : " / "TRY ANOTHER PAIR OF COORDINATES"; stop; run; filename x clear; filename z clear;
The above code produced the following in the output window of an interactive SAS session (if you look at the first example that used two zip codes and look at the calculated driving distance, you will see that the latitude and longitude just could be Albany, NY and Cary, NC).
DRIVING DISTANCE BETWEEN 42.691560,-73.827840 AND 35.805410,-78.797679 : 651 miles (TIME: 10 h 2 min )
A macro version of the above that would find the distance from a specified location to a series of other locations with all locations in terms of latitude and longitude might look as follows. The example uses a random sample of locations from the data set SASHELP.ZIPCODE and finds distances and times to each sample observation from the centroid of zip 12203.
* lat/long of first location; %let ll1=%str(42.691560,-73.827840); * create a data set with locations specified in latitude and longitude a random sample of 5 observations from SASHELP.ZIPCODE use SEED=0 to get a new sample each time program is run ; proc surveyselect data=sashelp.zipcode (keep=zip city statecode x y) out=lat_long sampsize=5 seed=0; run; * place number of zip in a macro variable in this example you know it is 5 but you might not know in another use of the SAS code ; data _null_; call symputx('nlls',obs); stop; set lat_long nobs=obs; run; * create a macro that contains a loop to access Google Maps multiple time; %macro distance_time; * delete any data set named DISTANCE_TIME that might exist in the WORK library; proc datasets lib=work nolist; delete distance_time; quit; %do j=1 %to &nlls; data _null_; nrec = &j; set lat_long point=nrec; call symputx('ll2',catx(',',y,x)); stop; run; * lat/long of centroid of zip 12203 hard-coded as part of the URL; filename x url "https://www.google.com/maps/dir/&ll1/&ll2/?force=lite"; filename z temp; * same technique used in the example with a pair of lat/long coodinates; data _null_; infile x recfm=f lrecl=1 end=eof; file z recfm=f lrecl=1; input @1 x $char1.; put @1 x $char1.; if eof; call symputx('filesize',_n_); run; * drive time as a numeric variable; data temp; infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' distance :comma12. text $30.; units = scan(text,1,'"'); text = scan(text,3,'"'); * convert times to seconds; select; * combine days and hours; when (find(text,'d') ne 0) time = sum(86400*input(scan(text,1,' '),best.), 3600*input(scan(text,3,' '),best.)); * combine hours and minutes; when (find(text,'h') ne 0) time = sum(3600*input(scan(text,1,' '),best.), 60*input(scan(text,3,' '),best.)); * just minutes; otherwise time = 60*input(scan(text,1,' '),best.); end; output; keep distance time; stop; done: output; run; filename x clear; filename z clear; * add an observation to the data set DISTANCE_TIME; proc append base=distance_time data=temp; run; %end; %mend; * use the macro; %distance_time; * add variables from original data set to new data set distance_time use geodist function to calculate straight line distance ; data distance_time; set distance_time; set lat_long point=_n_; straight_line = round(geodist(&ll1,-73.827840,y,x,'DM'), 0.01); run; proc print data=distance_time noobs label; var x y time distance straight_line zip city statecode; format zip z5. time time6. ; run;
The variable labels shown in the output have been propagated to the data set from SASHELP.ZIPCODE. Since a SEED of 0 is used in PROC SURVEYSELECT, if you cut/paste/run the above code you will get a different output.
DRIVING DISTANCE AND TIMES Longitude Latitude (degrees) of (degrees) of the center the center The Two-letter (centroid) (centroid) straight_ 5-digit abbrev. for of ZIP Code. of ZIP Code. time distance line ZIP Code Name of city/org state name. -75.683828 37.922079 6:49 423 343.42 23426 Sanford VA -79.290511 34.536371 11:00 740 635.25 28383 Rowland NC -88.065729 31.232711 19:18 1287 1113.06 36553 McIntosh AL -89.823776 34.633091 18:09 1219 1025.82 38638 Independence MS -117.519639 34.124307 40:00 2766 2415.80 91739 Rancho Cucamonga CA -122.146785 44.731480 47:00 2885 2388.54 97342 Detroit OR
If you do not already know about the data set SASHELP.ZIPCODE, you can read about it in the paper ZIP Code 411: A Well-Kept SAS Secret and ZIP Code 411: ZIP Code 411: Decoding SASHELP.ZIPCODE and Other SAS® Maps Online Mysteries.
There is an additional argument you can supply in any of the FILENAME statments that produce the URLs used to access Google Maps. That argument can be used to switch from driving distance/time to either walking, biking, or mass transit. In the example that specified two addresses (in macro variables), these were made to find walking time ...
NEW SECOND ADDRESS ... * my home; %let addr1=59 Lenox Ave, Albany, NY 12203; * Albany Medical Center; %let addr2=47 New Scotland Ave, Albany, NY 12208; NEW ITEM ADDED TO THE URL FILENAME ... filename x url "https://www.google.com/maps/dir/&z1/&z2/data=!4m2!4m1!3e2/?force=lite"; NOTE: various codes are ... Bicycling: /data=!4m2!4m1!3e1 Walking: /data=!4m2!4m1!3e2 Transit: /data=!4m2!4m1!3e3
These distances/times would probably be most helpful when using locations in an urban area. NOTE, the codes shown above were found at "DdDave's Mapping Stuff". The above changes resulted in this output.
DISTANCE & TIME BETWEEN 59 Lenox Ave, Albany, NY 12203 47 New Scotland Ave, Albany, NY 12208 3.9 miles 35 min
There are potential problems with all of the above SAS code. Finding the desired values of distance and time within the web page produced by Google Maps is based on the current structure of those web pages. This page has been edited several times to take into account changes made in the web page structure that occurred since the various examples were first posted.
NOTE ... The SGF paper that accompanies the methods shown on this page ("Driving Distances and Times Using SAS® and Google Maps") contains SAS code based on an earlier web page structure and that code no longer works.
The URL used in the FILENAME statement is also the current way of doing things with Google Map. If either or both (i.e. web page content, URL construction) change, one would have to modify the SAS code.
For more about using URL access in SAS, you can read Rick Langston's papers: "Handling Large Stream Files with the @'string' Feature" and "Creating SAS® Data Sets from HTML Table Definitions"
NOTE ... There is a SAS problem note "Problem Note 34098: SAS® is unable to connect to the host when specifying the host name while using... related to using the URL access method. If the code posted here produces an error message about not being able to connect to Google Maps, you may have to apply a HOTFIX to your SAS installation.
If you have any questions about this posting, you are welcome to send me a note by clicking here ... email Mike.
This work was funded in part by NIH grant HHSN267200700019C from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.
When using SAS to access Google Maps, you should familiarize yourself with the the Google Maps Terms of Service.
Another paper offers an alternative to the above methods. It shows how to use SAS with a MapQuest API (Application Program Interface) to get driving times and distances. Take a look at Batch Production of Driving Distances and Times Using SAS® and Web Map APIs by Ash Roy and Yingbo Na from theCanadian Institute for Health Information (a paper presented at the 2012 SAS Global Forum).
Article originally published by Mike Zdeb on sasCommunity.org.
This is fantastic! Thank you so much.
One minor correction (I believe)...
The line of code that reads
straight_line = round(geodist(&ll1,-73.827840,y,x,'DM'), 0.01);
should be
straight_line = round(geodist(&ll1,y,x,'DM'), 0.01);
since the longitude value of -73.827840 is part of the ll1 macro variable. Otherwise, there are too many arguments for the GEODIST function.
This is incredibly helpful. Does the Google Maps database take into account traffic? How do I get a distribution of travel times from point A to point B for multiple pairs of locations?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.