Hi,
I have a SAS dataset with about 200K records containing TranID, Lats and Longs. Anyone know how can I do reverse geocoding to get the Zip Codes? I found something about a dataset in SAS Help library but it returned only the Counties and the lats and longs need to be centoids.
Thanks,
Will
WPS/SAS/R: Given Latitude and Longitude get postal_code and other geography
If you have IML interface with R you can just paste the code below
The free express edition of WPS does not limit the size of the SAS dataset
produced by R
Package also does geocoding
HAVE
====
Up to 40 obs from sd1.have total obs=1
Obs LON LAT
1 -73.9738 40.7625
WANT
===
Up to 40 obs WORK.RESWPSXPO total obs=11
Obs NAME VALUE
11 POSTAL_CODE 10022
1 ADDRESS Trump, 725 5th Ave, New York, NY 10022, USA
2 PREMISE Trump
3 STREET_NUMBER 725
4 ROUTE 5th Avenue
5 NEIGHBORHOOD Midtown
6 POLITICAL Manhattan
7 LOCALITY New York
8 ADMINISTRATIVE_AREA_LEVEL_2 New York County
9 ADMINISTRATIVE_AREA_LEVEL_1 New York
10 COUNTRY United States
WORKING CODE
============
WPS/R
res <- revgeocode(lonlat_sample, output="more");
You can also geocode
geo <- geocode("725 fifth avenue, New York NY")
yeild see code at end
[1] -73.97378 40.76245
[1] 10022
[1] Midtown
FULL SOLUTION
=============
%utl_submit_wps64('
libname sd1 "d:/sd1";
options set=R_HOME "C:/Program Files/R/R-3.3.2";
libname wrk "%sysfunc(pathname(work))";
proc r;
submit;
source("C:/Program Files/R/R-3.3.2/etc/Rprofile.site", echo=T);
library(haven);
library(ggmap);
lonlat_sample<-as.numeric(read_sas("d:/sd1/have.sas7bdat"));
res <- revgeocode(lonlat_sample, output="more");
endsubmit;
import r=res data=wrk.reswps;
run;quit;
');
proc transpose data=reswps out=reswpsxpo;
var _all_;
run;quit;
p to 40 obs from reswps total obs=1
STREET_
bs ADDRESS PREMISE NUMBER ROUTE
1 Trump, 725 5th Ave, New York, NY 10022, USA Trump 725 5th Avenue
ADMINISTRATIVE_ ADMINISTRATIVE_ POSTAL
NEIGHBORHOOD POLITICAL LOCALITY AREA_LEVEL_2 AREA_LEVEL_1 COUNTRY CODE
Midtown Manhattan New York New York County New York United States 10022
%utl_submit_r64('
source("c:/Program Files/R/R-3.3.2/etc/Rprofile.site",echo=T);
library("ggmap");
lonlat_sample <- as.numeric(geocode("725 fifth avenue, New York NY"));
class(lonlat_sample);
lonlat_sample;
res <- revgeocode(lonlat_sample, output="more");
res$postal_code;
res$neighborhood;
');
Look at PROC GINSIDE.
You'll need to get hold of a shape file, then run proc ginside to do the point-in-polygon processing. I did this successfully a few years ago at NZ Police to get meshblocks, which could be rolled up to larger areas groups.
These shape files contain all the points bounding the area, which can in fact be multiple shapes. Read in the shape file via proc mapimport, then run over the generated dataset with your 200k rows with ginside, and the counties will automagically be included.
Often the shape files can be very wide, with all the higher grouping codes and texts, but you can drop those out with a datastep after the importing. It's vital to retain the internal order of the shape file - don't drop observations, just variables.
As an indication, a shape file with forty-odd million rows, against a dataset with 1.6 million locations, took around 3-4 hours to run initially. Because the location dataset was being incrementally updated, the following runs on unique locations could zip through (puns always intended) in 3-4 minutes.
I only have access to SAS UE, so can't test any of this.
However, I think that the following articles are what you're looking for:
The first one shows you how to create a zipcode map
http://support.sas.com/resources/papers/proceedings13/214-2013.pdf
The second one shows you how to obtain the zipcodes from that map, based on only having a pair of grid coordinates, using proc ginside: http://support.sas.com/documentation/cdl/en/graphref/65389/HTML/default/viewer.htm#p0qjcc8hugcjb2n1x...
Art, CEO, AnalystFinder.com
WPS/SAS/R: Given Latitude and Longitude get postal_code and other geography
If you have IML interface with R you can just paste the code below
The free express edition of WPS does not limit the size of the SAS dataset
produced by R
Package also does geocoding
HAVE
====
Up to 40 obs from sd1.have total obs=1
Obs LON LAT
1 -73.9738 40.7625
WANT
===
Up to 40 obs WORK.RESWPSXPO total obs=11
Obs NAME VALUE
11 POSTAL_CODE 10022
1 ADDRESS Trump, 725 5th Ave, New York, NY 10022, USA
2 PREMISE Trump
3 STREET_NUMBER 725
4 ROUTE 5th Avenue
5 NEIGHBORHOOD Midtown
6 POLITICAL Manhattan
7 LOCALITY New York
8 ADMINISTRATIVE_AREA_LEVEL_2 New York County
9 ADMINISTRATIVE_AREA_LEVEL_1 New York
10 COUNTRY United States
WORKING CODE
============
WPS/R
res <- revgeocode(lonlat_sample, output="more");
You can also geocode
geo <- geocode("725 fifth avenue, New York NY")
yeild see code at end
[1] -73.97378 40.76245
[1] 10022
[1] Midtown
FULL SOLUTION
=============
%utl_submit_wps64('
libname sd1 "d:/sd1";
options set=R_HOME "C:/Program Files/R/R-3.3.2";
libname wrk "%sysfunc(pathname(work))";
proc r;
submit;
source("C:/Program Files/R/R-3.3.2/etc/Rprofile.site", echo=T);
library(haven);
library(ggmap);
lonlat_sample<-as.numeric(read_sas("d:/sd1/have.sas7bdat"));
res <- revgeocode(lonlat_sample, output="more");
endsubmit;
import r=res data=wrk.reswps;
run;quit;
');
proc transpose data=reswps out=reswpsxpo;
var _all_;
run;quit;
p to 40 obs from reswps total obs=1
STREET_
bs ADDRESS PREMISE NUMBER ROUTE
1 Trump, 725 5th Ave, New York, NY 10022, USA Trump 725 5th Avenue
ADMINISTRATIVE_ ADMINISTRATIVE_ POSTAL
NEIGHBORHOOD POLITICAL LOCALITY AREA_LEVEL_2 AREA_LEVEL_1 COUNTRY CODE
Midtown Manhattan New York New York County New York United States 10022
%utl_submit_r64('
source("c:/Program Files/R/R-3.3.2/etc/Rprofile.site",echo=T);
library("ggmap");
lonlat_sample <- as.numeric(geocode("725 fifth avenue, New York NY"));
class(lonlat_sample);
lonlat_sample;
res <- revgeocode(lonlat_sample, output="more");
res$postal_code;
res$neighborhood;
');
Thank you all and especially to @rogerjdeangelis, I happened to know how to use R as well so will try the SAS & R solution. thanks again.
Hi @rogerjdeangelis,
I am getting the following error when trying out your solution in WPS. Any clues?
Thanks,
Will
Check out
http://blogs.sas.com/content/iml/2011/05/13/calling-r-from-sasiml-software.html
or
use the excellect SAS help facilities.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.