Dissolving Boundaries: How to Aggregate Spatial Data using PROC GREMOVE

If you work with geospatial data, then you know that are a wide variety of data sources available: shapefiles from the US Census Bureau, built-in map data sets included with SAS, and any number of options available from third-party providers. However, you may often find that to create the exact visualization you want or to perform the specific type of analysis that you want, available spatial data sources don’t quite match your needs. In this article, I explain how to dissolve internal boundaries in polygon data to construct the mapping data sets that perfectly meet your needs.

Overview

Imagine that you have a map data set of US states, but your data is at the region level. For example, you might want to map a variable like population at the level of regions, but all you have is a map data set of states. To make this work, you’d need to remove the boundaries between states and replace them with region boundaries.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Using the GREMOVE Procedure

In SAS, you can do this with the GREMOVE procedure. The GREMOVE procedure aggregates individual units in a map data set to combine them, removing internal boundaries and creating a larger unit. It does this based on a common ID value. For example, in the image below, there are four square polygons. Three of them are gray, and the fourth is blue.

By using the color as an ID value, I could dissolve the boundaries between the three gray polygons, creating a resulting map data set with just two polygons: a large gray L-shape, and a blue square.

PROC GREMOVE Syntax

The GREMOVE procedure looks like this:

PROC GREMOVE DATA=input-map-data-set

<OUT=output-map-data-set>;

BY variable(s);

ID id-variables(s);

RUN;

The DATA= option specifies the input data set, which must have X and Y variables that contain the latitude and the longitude. The OUT= option specifies the new map data set that will be created. You’ll also need to identify both the current polygon units and the desired polygon units. The ID statement specifies the variable identifying the current units, and the BY statement specifies the variable identifying the new units.

Putting It All Together

Let’s look at an example of how this might work in practice. Imagine I have a map data set of the continental United States. It has X and Y coordinates, as well as a column that identifies the state name and code.

I eventually want to use this map data set to create a map using data about the different US Court of Appeals circuits. There are 11 different circuit courts, each covering a different geographic area of the United States. However, this map data set only contains state boundaries.

In order to use PROC GREMOVE to dissolve the borders between states and create a map of circuit court boundaries, I’ll need to modify my data. PROC GREMOVE requires that the input data has two ID variables: one that identifies the current unit areas, and one that identifies the desired unit areas. This data has a variable that identifies the current unit of states, but I need to add a second variable that indicates which circuit each state belongs to. To do this, I can simply join the map data with a lookup data containing the state/circuit information.

I can do this with a simple SQL join. I make sure to order the resulting table by the CIRCUIT variable and the _SEQ_ variable, which is the order the points should be connected in to form a map.

proc sql;
	create table state_circuit as
	select states.*, appeals_court.circuit
	from work.states left join work.appeals_court
		on states.stusps=appeals_court.state_abr
	order by circuit, _seq_;
run;

The resulting table looks like this, where each row now has the corresponding court of appeals circuit:

Now, I’m finally ready to use the GREMOVE procedure. The input data has a column that identifies the current unit area (or state), and the CIRCUIT column identifies the desired unit areas. All that’s left is to run the following code:

proc gremove data=state_circuit out=circuit;
	by circuit;
	id name;
run;

The output table will have less rows because rows that make up state boundaries within an appeals court circuit have been removed.

Now when I create a map using the output data, I can clearly see the different circuit count boundaries, appropriate for creating visualizations or doing further analysis using data at the circuit court level.

Conclusion

Spatial data may not always come perfectly prepared in the format that you need to create visualizations or perform a particular analysis. However, with the use of tools like the GREMOVE procedure in SAS, it is easy to manipulate and customize map data to meet your needs. Stay tuned for future articles about other ways to manipulate and prepare spatial data!

Find more articles from SAS Global Enablement and Learning here.