How to Create Custom Geographic Variables for Use in VA GeoMap Hierarchies
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
The world is a big place! And sometimes creating Visual Analytics reports that have detailed coordinates on a geo map can be challenging because your source data doesn't contain known geography columns from which you can a create a hierarchy. If this is the case, when a report viewer wants to review a specific area of the map they will be forced to zoom and zoom and zoom in on the map for quite some time before they can finally view the data they are interested in.
For example, imagine we have the following data containing the longitude and latitude for a set of 805 individual US zip codes:
We want to visualize this in map as part of a Visual Analytics report. Unfortunately, if we simply map these zip codes on a map, they are so tightly clustered that a user must zoom in quite a bit to view the individual data points. So if the user is interested in the data located in Boston, Massachusetts it will take quite a bit of zooming in until they can see it:
This is less than ideal. What would be great if the data contained some additional higher level geographic columns. If it did we could create a hierarchy so the user could simply "Drill Down" on the map in just two clicks:
The drill down method for looking at the data near Boston is certainly the better way to allow our users to navigate the map.
Even though these additional columns aren't in our source data, that doesn't mean we can't use SAS to build them! Because we have the full longitude and latitude for each zip code in our source data, we can use this to generate the additional columns needed for our hierarchy. This page explains how the various decimal places of a longitude and latitude coordinates represent different levels of geographic precision. We can see that coordinates with decimal degrees of 0.1 represent a "Large City or District" and coordinates with decimal degrees of 0.001 represent a "Neighborhood or Street" and so on. So all we need to do is create these levels of geographic precision by rounding our current longitude and latitude columns. In this article, I'll show you how to do this!
Let's get started!
Step 1: Create your hierarchy
Let's begin by generating the sample data for our example. As I mentioned before, the source data for the report shown above contains the longitude and latitude for a set of 805 individual US zip codes. I created this data from the sashelp.zipcode data set. To recreate the example source data, first we want to isolate the individual coordinates for each zip code. We will also want to convert the zip code to a character variable so we can use it as a geography data type in our report. This is easily done via the PROC SQL statement below:
proc sql;
create table getZips as select
put(zip,z5.) as zipcode, x, y
from sashelp.zipcode;
quit;
A few observations from the output data set getZips are shown below:
This three column data set will be the starting data for our example. Now that we have the our zip codes with their respective coordinates we can begin the process of creating our hierarchy of varying levels of precision using the latitude and longitude variables. What we need to do is take each observation and round the x and y columns to the level of precision we want. After that, we will need to create a character variable that can serve as the category column for the geography data level. Finally, we will add a label to our new category column.
Since we want to repeat this process for each level of precision, I have wrapped the steps listed above into a macro named "buildHierarchy". When I call the macro, I'm using it to create a hierarchy with the levels of precision at:
- Large City or District - 0.1
- Town or Village - 0.01
- Neighborhood or Street - 0.001
data buildHierarchy;
set getZips;
%macro buildHierarchy(level,prec,var,label);
lat_&var = round(y,&prec.);
long_&var = round(x,&prec.);
&var = "&var: Lat:" || compress(lat_&var.) || " x " || "Lon: " || compress(long_&var.);
label &var = "&label";
%mend;
%buildHierarchy(00,.1,City,Large City or District)
%buildHierarchy(01,.01,Town,Town or Village)
%buildHierarchy(02,.001,Neighborhood,Neighborhood or Street)
;
run;
A few observations from the output data set buildHierarchy are shown below:
For this article, I've chosen to use observations where the frequency of the "city" variable is 35 or more. This step is completely optional, however if you would like to subset your data as I did, run the PROC SQL and DATA STEP code below:
proc sql;
create table getTopCities as
select *, count(City) as totalCityObs
from buildhierarchy
group by City
having (calculated totalCityObs) > 35;
quit;
data buildhierarchy;
set getTopCities;
drop totalCityObs;
run;
Great! Our data is now ready for SAS Visual Analytics!
Step 2: Build the report!
Begin by creating a new Visual Analytics report and add the data set 'buildhierarchy' to your report.
We will first need to convert our character columns into geographies. Let's start with the "Large City or District" column. From the data menu select this column and change its category to "Geography":
At the next screen, for "Geography data:" select "Latitude and longitude in data". For the Latitude and Longitude mappings choose:
- Latitude (y): lat_City
- Longitude (x): long_City
Press "OK".
Great! Now repeat the process for the other categorical columns. Use the latitude and longitude mappings below:
- Town or Village
- Latitude (y): lat_Town
- Longitude (x): long_Town
- Neighborhood or Street
- Latitude (y): lat_Neighborhood
- Longitude (x): long_Neighborhood
- zipcode
- Latitude (y): Latitude (degrees) of the center (centroid) of ZIP Code
- Longitude (x): Longitude (degrees) of the center (centroid) of ZIP Code.
With our geography data items in place, we can now create our hierarchy. From the data menu, click "New data item" and select "Hierarchy":
Name your hierarchy "CustomGeoHierarchy" and add the columns in the order shown below.
Press OK:
*Notice I have chosen not to use the "Neighborhood or Street" column in my hierarchy. This is simply a design choice I made. You can include it in your hierarchy if you like.
Great! Now we can build our map! Drag your new hierarchy onto the reporting canvas. Since the hierarchy is made up of geographic data items a GeoMap report object will be automatically created:
By default, Visual Analytics 8.5 will create a coordinates map for you. Since we want to point out which areas have the most zip codes we will need to change this to a Bubble Geo Map. To do this, click the map object and select the "Options" menu on the right handed side of the screen. From within this menu, scroll down until you see the "Coordinate" section and select "Bubble":
Great! All that's left is to do some window dressing. I decided to give my map a title of "Custom GeoHierarchy". I have also removed the legend.
With these final edits in place we have successfully created our report! Now that we created our custom geographic columns in SAS and used them in a hierarchy in Visual Analytics, a user can drill down to detailed data in just two clicks!
This example was created in SAS Visual Analytics 8.5. However the concepts from this article will work in any version of SAS Visual Analytics. Also, if you are using SAS Visual Analytics 8.5 you could also consider using the "Cluster adjacent markers" option within a coordinate geo map. You can learn more about this options in the SAS Visual Analytics 8.5 documentation.
How to make this example work for you
This example was created in SAS Visual Analytics 8.5. Again, for this post's data, I used the zipcode data set which is available in the sashelp library.
The attachments for this post are:
- The ETL which creates the final buildhierarchy.sas7bdat dataset - buildhierarchy.sas