BookmarkSubscribeRSS Feed

Reverse Geocoding Using PROC GINSIDE

Started ‎09-21-2023 by
Modified ‎09-21-2023 by
Views 700

Geocoding is the process of converting addresses into latitude and longitude coordinates. Doing so allows you to take address data and then easily visualize it on a map. If you want to geocode address data using SAS, you can use the GEOCODE procedure (see this article for more information).

 

However, sometimes you may need to do this reverse: you may have latitude and longitude data and need to figure out the address, or components of the address. This process is known as reverse geocoding, and it consists of converting geographic coordinates into a human readable address or place name. While there is no REVERSE GEOCODE procedure, SAS still makes it easy to derive place names and address components from latitude and longitude values. In this article, I discuss how to use SAS procedures and easily available lookup data to do so.

 

How (Reverse) Geocoding Works

 

Both geocoding and reverse geocoding work in basically the same way. At a high level, both processes require input data and lookup data. For reverse geocoding, the input data contains latitude and longitude coordinates, and the lookup data contains some kind of reference range of latitude and longitude values that correspond to certain address values or place names.

 

One common source of coordinate range lookup data is polygon data. For example, if you have coordinate location information and you need to determine what US state a particular location is in, you can use a data set containing US state polygons. Then you simply need to determine a coordinate point is within a particular polygon or not. The image below demonstrates this: you can see that one coordinate point is located in the state of North Carolina, and the other is within Virginia. 

 

01_GT_4.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

In SAS, you can do this with the GINSIDE procedure.  

 

PROC GINSIDE Basics

 

The GINSIDE procedure compares a data set of X and Y coordinates to a map data set containing polygons and determines if the X and Y coordinates are inside or outside of the polygon. It requires two input data sets. The syntax of PROC GINSIDE is as follows:

 

PROC GINSIDE DATA=points-data-set                                             

 

MAP=map-data-set                                             

 

<OUT=output-data-set>                                             

 

<DROPMAPVARS>                                             

 

<INCLUDEBORDER>;              

 

ID id-variable(s);

 

RUN;

 

The DATA= option is where you can specify the X-Y coordinate data set being reverse geocoded, and the MAP= option is where you specify the reference polygon data set. In SAS, a polygon data set is a normal SAS table where each row represents a single point. To create a polygon, the points are connected together in order.

 

The ID statement is where you specify the ID variable in the map data set that should be checked against the points from the input data set – for example, the column containing the US state name. Additional options include INCLUDEBORDER, which includes points that are on the border of a polygon, rather than only those fully inside it, and DROPMAPVARS, which only keeps whatever the ID variable is from the map data set in the output data set and drops everything else.

 

In both the input data sets, the longitude and latitude values must be in columns with the names X and Y.

 

Let’s look at an example.

 

Deriving County Names

 

In this example, I have latitude and longitude coordinates that represent traffic violations in Montgomery County, Maryland, recorded using a GPS device. While I can easily plot those coordinates on a map, I also want to be able to aggregate them by Census tract to see which areas are most incident-prone, as well as to compare other county-level demographic statistics. This means that I need to use reverse geocoding techniques to figure out which Census tract a particular set of geographic coordinates is within.

 

02_GT_3.png

 

Since I want to determine the Census tract of each point, I need to use a lookup data set that contains US Census tract polygons. Luckily, the US Census Bureau publishes a comprehensive set of cartographic boundary files that can be used to reverse geocode states, counties, Census tracts, and other administrative boundaries in the US. I will download the Census tract boundary file for the state of Maryland and import it into SAS as a SAS table.

 

03_GT_1.png

 

Note these boundary files provided by the Census Bureau (and other polygon lookup data set more generally) come in the shapefile file format, which is a common file structure used to store geographic data. In order to import a shapefile into SAS, you must use the MAPIMPORT procedure.

 

To do so, I’ll run the following code:

 

%let tract_lookup=C:\Documents\cb_2022_24_tract_500k.shp; 
proc mapimport out=lookup datafile="&tract_lookup";
     id name;
run;

 

Note that the ID statement in the MAPIMPORT procedure must identify the column that uniquely identifies each polygon in the shapefile. For example, in this shapefile containing US Census tract boundaries, the ID statement must specify the column containing the tract name.

 

Projections and Coordinate Systems

 

One important requirement is that in order for PROC GINSIDE to work accurately, both input data sets must be in the same projection.

 

While projections and coordinate space can seem overwhelming, it simply refers a mathematical transformation for taking the curved surface of the earth and representing on something flat, like a paper map or a computer screen. Because there are many different possible projections that rely on different underlying mathematical models, the same coordinates can be displayed in wildly different locations if the wrong projection or coordinate space is specified. 

 

04_GT_5.png

 

To use PROC GINSIDE, both input data sets must use the same project so that two data sets “line up” with each other spatially. Luckily, it’s easy to convert coordinates from one projection to another. In SAS, you can accomplish this with the GPROJECT procedure.

 

For the purposes of the GINSIDE procedure, it doesn’t actually matter what projection is used – all that matters is that it is the same projection. However, in many cases this step will not be necessary, as the lookup data and the data being reverse geocoded may already be in the same projection. For example, in this example, the traffic violation data and the lookup data set are both unprojected, and so nothing else needs to be done. The projection/coordinate space of shapefile data can be verified by examining the .PRJ file.

 

Putting It All Together

 

Once I’ve loaded the lookup data set containing Census tract polygons and confirmed that both the lookup data set and the point data set to be reverse geocoded are using the same projection, I need to run the PROC GINSIDE step. I’ll submit the following code:

 

proc ginside data=violations map=lookup
     out=violations_tract includeborder dropmapvars;
     id name;
run;

 

In the ID statement, I’ve specified which column identifies the Census tract polygons. For each X and Y point in the input data set, the procedure will determine if the point is within a polygon in the reference data set. If it is, the ID value of that polygon will be written to the output data set. I’m using the INCLUDEBORDER option to keep any points that may fall exactly on a polygon border. I’ve also added the DROPMAPVARS option because the only column I want to keep from the reference data is NAME, which contains the Census tract name.

 

Then, I can examine the results. The output table, TV_TRACT, has two new columns: _ONBORDER_ and NAME. The _ONBORDER_ column comes from adding the INCLUDEBORDER option, and it contains either a 1 or a 0. A 0 means that a particular point is not on a polygon border, whereas a 1 means that it is. The NAME column indicates which Census tract each point is located within – this is added from the reference data.

 

05_GT_2.png

 

Now that I have this output data set, I could aggregate traffic violations by Census tract to see which areas have the highest number of violations, or I could join the results with other demographic information to add additional context – neither of which would be possible without first reverse geocoding the coordinate data.

 

Conclusion

 

While SAS doesn’t have an explicit reverse geocoding procedure, it is easy to use the GINSIDE procedure to add address components or place names to geographic coordinate data. By simply changing the reference data set, you can easily reverse geocode coordinate data at many different levels, from Census tract to ZIP code to county and more.

 

Tips and Tricks

 

  • Boundaries for many administrative units change over time. The US Census Bureau provides historical boundary information for many geographies, including Census tracts and ZIP code tabulation areas.
  • If your data needs to be reprojected, you can use the resource SpatialReference.org to lookup up projection codes that PROC GPROJECT understands.
  • Filtering your reference data can improve performance and save space. For example, if you are reverse geocoding counties in Colorado but your reference data set is all counties in the US, you can subset it to include only Colorado counties.
  • Your results will only be as good as your reference data. Be sure to verify that your reference data is from a reliable, up-to-date source.
  • Be careful with column names. Longitude and latitude values must be in columns named X and Y.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎09-21-2023 02:52 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags