Just before Christmas SAS gave an early present to University Edition users in the form of an update to SAS 9.4M5. Among other things this included the SGMAP procedure which, for the first time, allowed SAS University Edition users to perform mapping without resorting to third party tools.
This article details the author's first attempt with Proc SGMAP and follows the process of designing and writing code to develop a map which is both informative and easy on the eye.
At the end of every year it’s customary to think back and review the prominent news stories of the previous twelve months. In 2017 one of the big issues has been the continued testing of long range missiles by North Korea as part of its nuclear weapons program. The James Martin Center for Nonproliferation Studies maintains a database of all missiles launched by North Korea which are capable of delivering a payload of at least 500 kilograms a distance of at least 300 kilometers. The first such test occurred in April 1984 with the most recent (at the time of writing) on 28 November 2017.
This database is made available as a downloadable Excel file from the Nuclear Threat Initiative Project web site.
The file contains a number of worksheets - we will be using the one called Facilities. The others contain much interesting data but are not required to create our map.
My aim was to import the data into SAS, then create a bubble plot showing how many tests were carried out from each site (whether successful or not).
I believe that creating a great map is as much about art as science so I generally start out with a simple rendering of my data and build on that incrementally until I am happy with the result. I firstly imported the data using Proc Import.
proc import file="/folders/myshortcuts/Dropbox/north_korea_missile_test_database.xlsx" out=nktests dbms=xlsx replace; /*--- Use Excel column headings as SAS variable names. */ getnames=yes; /*--- Specify Excel sheet name and data range to import. */ range='Facilities$A2:G23'; run;
The latitude and longitude variables are in character format in the file so I needed to create numeric versions and discard one test where the location was “Unknown”.
/* Convert the latitude and longitude values to numeric */ data nktests; set nktests(where=(facility ne "Unknown")); lat_num = input(latitude, best. ); long_num = input(longitude, best. ); run;
Having done that, I’m now ready to try my first map. Here’s the code I used:
title 'North Korean Missile Tests'; proc sgmap plotdata=nktests; openstreetmap; bubble x=long_num y=lat_num size=number_of_tests / name='Facilities' datalabel=facility datalabelattrs=(color=red) datalabelpos=right; keylegend 'Facilities'; run; quit;
This is the result:
So far so good - I have a map but there are some things I’m not 100% happy with:
In order to solve the problem of the overlapping labels I’m only going to show a label for the most used site and at the same time I’ll take the opportunity to make the label bigger, change the color and add some descriptive text. In order to discover which is the most used site I’ll use Proc Rank descending with ties=low. If, as in this case, there are one or more records for a rank the smallest of the ranks is assigned. In this case both the Kittaeryong Missile Base and the North Wonsan facility have held 20 tests therefore we’ll be showing a label for both of these facilities.
One useful feature of the way labels work in Proc SGMAP is that if there is no data for an observation in the label variable then SAS ignores that row when rendering the labels. I’m therefore going to create a label for each of the observations where the rank is equal to 1. I do this in a data step creating a new variable (called newlabel) which holds the facility name plus the number of tests carried out there. I’ll also take the opportunity to improve my header and add some footer information giving the source and an explanation of what the relative size of the bubbles means. Here is the code for the second cut of my map
/* Determine which facility is used the most */ proc sort data=nktests; by descending number_of_tests; run; proc rank data=nktests out=nktestrank descending ties=low; var number_of_tests; ranks numrank; run; /* Set a label for the most used facility/facilities */ data nktests2; set nktestrank; if numrank=1 then newlabel=strip(facility)||" - Number of Tests="||put(number_of_tests,2.); run; title 'North Korean Missile Tests'; title2 'Location of Launch Facilities'; footnote j=l 'Data from The James Martin Center for Nonproliferation Studies' j=r 'Relative Number of Launches per Site'; proc sgmap plotdata=nktests2; openstreetmap; bubble x=long_num y=lat_num size=number_of_tests / datalabel=newlabel datalabelattrs=(color=black size=10pt) datalabelpos=right; run; quit;
This is the result:
I’m pretty happy with the map at this stage but there is just one more feature I’d like to add. It seems to me that the map lacks a little in context so I’d like to show how close the facilities are to the North and South Korean capitals.
To do this I obtained latitude and longitude coordinates for Pyongyang and Seoul from a google search and created a temporary file to hold the data. As there were only two records I adopted a “quick and dirty” method for creating the temporary file to hold the two series and adding them to the main data set. In order to emphasise that these are the capitals of two different countries I’m going to make the markers different colors – red for Pyongyang and blue for Seoul. I can do this because Proc SGMAP not only allows you to create a bubble plot and a scatter plot on the same base map but allows you to create multiple scatter plots which is what I’m going to do here.
/* Create a data set with details of the N and S Korean capitals */ data cap; length newlabel2 $25; cap_lat_num=39.0392; cap_long_num=125.7625; newlabel2="Pyongyang"; output; cap_lat_num2=37.56652; cap_long_num2=126.9780; newlabel3="Seoul"; output; run; data nktests3; set nktests2 cap; run; title 'North Korean Missile Tests'; title2 'Location of Launch Facilities'; footnote j=l 'Data from The James Martin Center for Nonproliferation Studies' j=r 'Relative Number of Launches per Site'; proc sgmap plotdata=nktests3; openstreetmap; bubble x=long_num y=lat_num size=number_of_tests / datalabel=newlabel datalabelattrs=(color=black size=10pt) datalabelpos=right; scatter x=cap_long_num y=cap_lat_num / datalabel=newlabel2 datalabelattrs=(color=black size=8pt) datalabelpos=right markerattrs=(symbol=star size=15PX color=red); scatter x=cap_long_num2 y=cap_lat_num2 / datalabel=newlabel3 datalabelattrs=(color=black size=8pt) datalabelpos=right markerattrs=(symbol=star size=15PX color=blue); run; quit;
This the result of my final cut:
I'm pretty happy with this!
One issue with using Proc SGMAP with the openstreetmap statement is performance. Because the map tiles which comprise the map have to be downloaded from an external server you shouldn’t expect this to be quick. My runs for each map were taking between one minute and thirty seconds and two minutes ten seconds to complete. Of course, your experience will probably vary but don’t (as I did) assume on your first run that SAS has hung and click the cancel button….
We have seen that Proc SGMAP can be used to provide a high-quality map with bubble plot and scatter plot overlays. Choropleth maps are also available as a pre-production option and as SAS moves more towards using the SG family of graphing tools it seems likely that Proc SGMAP will continue to be developed to the point where it may replace Proc GMap as the map creators tool of choice.
If you have any comments on this map or would like to see other data mapped with Proc SGMAP please leave a message in the comments section below.