After talking with a colleague about how slow geocoding is, he suggested a boundary box and he had some thoughts around it but I'm not sure how I would code this. If anyone has experience with this and would like to share I would appreciate the advice! His thoughts were that a box algorithm will create a square that encompasses the circle that you are searching within and anything inside that square is opted in for search, but anything outside is passed over. This should speed up the geocoding process, I'm just not sure how to do it! This is my code for finding the coordinates per person (proc geocode) and then calculating the distance from a single point of interest using geodist. The proc geocode step takes about 15 minutes right now when I run approximately 45k records through that step. That point of interest for geodist has fictitious lat/lon for this example, /*use proc geocode to set lat and lon per person*/
options msglevel=N;
proc geocode
method=street /* Geocoding method */
addressvar=add_line_1 /*should be address field, could also use ADDRESSSTATEVAR, ADDRESSZIPVAR, ADDRESSCITYVAR, etc to define other fields if not named correctly on source dataset*/
data=work.groomed_population /* Input address data */
lookupstreet=geo.usm /*needs to point at USM dataset downloaded from SAS*/
out=work.geocoded; /* Output data set */
run;
quit;
options msglevel=I;
/*use geodist to find exact mileage between address and each persons lat/lon*/
data distance;
retain x y;
set geocoded;
dist = geodist( 12.123456, -12.123456, y, x, 'M' );
run;
... View more