Mike already provided the code in his paper. I would still start with proc sql to create a file with all combinations of the zip codes, then wrap Mike's code in a Macro that loops through the file you created with proc sql. All you have to pass into the macro are the zip code pairs and run it for each pair.
That would simply replace the suggested datastep that is using the SAS zip/distance function.
The rest of the code, whichever method you choose, would remain the same.
You should carefully read the Google Maps terms of use documents before using the drive time code to make sure it isn't in violations of their terms. At one time, they specified that 'services' could only be used if the information was being put on a Google map.
hi ... some other ideas ...
#1 this also uses the ZIPCITYDISTANCE function ... basically Robert's ideas in one use of PROC SQL, though I took out the condition
where usa1.zip^=metro_rank.metro_zip;
since I think it'd be OK if the closest USA zip to a METRO zip was the same zip, yes/no?
the first part cleans up USA1
it replaces spaces in ADDZIP with asterisks then creates a numeric variable UZIP using the 1st five characters of ADDZIP
(you get some LOG messages about INVALID strings)
it also checks to see if UZIP is in SASHELP.ZIPCODE since ZIPCITYDISTANCE uses that data set
the second part grabs the top 50 observations in METRO_RANK
the third part computes the distance and grabs the minimum distance between a TIC and a top 50 CITY
you get some LOG messages about INVALID strings
proc sql;
create view usa as
select unique tic, input(translate(addzip,'*',' '), 5.) as uzip format=z5.
from z.usa1, sashelp.zipcode (keep=zip) as zc
having uzip eq zc.zip;
create view metro as
select city, sum_rank, input(zipcode, 5.) as mzip format=z5.
from z.metro_rank (where=(sum_rank between 1 and 50));
create table pairs as
select *, zipcitydistance(uzip, mzip) as dist
from usa, metro
group tic
having dist eq min(dist);
quit;
#2 this would be too big a problem to for the SAS and Google Maps drive distance/time (not really intended for 500K+ calls to Google Maps)
#3 the code in the original SGF paper no longer works ... a link to the most recent code canbe found at ...
http://www.sascommunity.org/wiki/User:Msz03
(but remember, your problem is TOO BIG)
#4 Darrell's comment of "... Google Maps terms of use ..." ... I looked (again) at ...
http://www.google.com/help/terms_maps.html
and see the two possible sticking points as how one would interpret ...
(a) access or use the Products or any Content through any technology or means other than those provided in the Products, or through other explicitly authorized means Google may designate (such as through the Google Maps/Google Earth APIs);
and
(e) use the Products in a manner that gives you or any other person access to mass downloads or bulk feeds of any Content, including but not limited to numerical latitude or longitude coordinates, imagery, and visible map data;
I see at ... http://blog.programmableweb.com/2011/10/27/google-maps-usage-fees-how-many-developers-will-have-to-p...http://code.google.com/apis/maps/faq.html#usagelimits
that folks using Google APIs for map access have a 25,000 free maps/day, after that there's a charge
#5 take a look at ... http://www.tandfonline.com/doi/abs/10.1080/00330124.2011.583586#preview
Boscoe FP, Henry KA, Zdeb MS.
A Nationwide Comparison of Driving Distance Versus Straight Line Distance to Hospitals.
Professional Geographer 2012
the conclusion in that paper is that in most instances, the added precision of driving distance over straight line distance isn't worth the effort
Thanks for suggestions,
I think I need time to figure out the correct ways to calculate the drive time efficiently. Some locations may not be accessible by roads. This task is much more complicated as compared to merely calculating the straight line distance between locations as it involves the use of google maps and a big country - US.
There might be many different possible routes on road from one location to another. I will have to choose, the lowest drive time among the possible routes.
As requested by art297, I will share some useful tips if I found any good way to handle this complicated calculations efficiently.
Thank you.
One performance suggestion to Mike's program. Use the 'geodist' function instead of 'zipcitydistance'. ZipCityDistance is being passed the ZIP Code of every city times every tic ZIP Code. It must find the location of the ZIP Code for each ZIP Code every time it is called. It would be faster to just lookup each location once.
The first view in Mike's code can be tweaked to save the Y, X (Lat, Long) values from the SASHELP.ZIPCODE data set.
This change would replace his code:
select unique tic, zc.x, zc.y, input(translate(addzip,'*',' '), 5.) as uzip format=z5.
from z.usa1, sashelp.zipcode (keep=zip x y) as zc
The metro_rank data set already has the Latitude and Longitude for each city (I'm assuming it is correct). This just needs to be kept in the data with the following change:
select city, sum_rank, latitude, longitude, input(zipcode, 5.) as mzip format=z5.
Finally, the line with zipcitydistance should be changed to use geodist:
select *, geodist(y, x, latitude, longitude, "M") as dist
A couple of notes:
- The metro_rank data set has a location in Puerto Rico in the top 50. Not sure if that matters.
- The code allows for a tie - 2 or more cities the same distance away.
Hi again,
The metro_rank data is extracted from a US Zip code database and US 2010 Census data. I assume that it is correct. My longitude and latitude data, are in decimals. Therefore, I think they must be converted into radians before calculations.
I notice that zipcitydistance (1 decimal point) output is very close to the geodist. I think, zipcitydistance is more than enough to fit my purpose. If not mistaken, the zipcitydistance function is not applicable for US but not for other countries. If I were to calculate the distances in other countries, I must noit use the zipcitydistance function.
Thank you anyway - it is useful for my other projects.
While we all obviously have to respect terms of agreement, and I'm not familiar with what the following authors did, I just noticed an interesting paper title scheduled to be presented at this year's SGF:
Batch Production of Driving Distances and Times Using SAS® and Web Map APIs
Ash Roy, Canadian Institute for Health Information
Yingbo Na, Canadian Institute for Health Information
http://sgf2012.confnav.com/sgf2012/web/sessions/date/view.html#20111103194323199460000000
I can't attend their presentation, as I'm presenting a paper at the same time that those authors are presenting, but you can probably contact them now and get some additional answers to your question.
If you discover anything useful I, for one, would appreciate your posting a summary here.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.