Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
While it is true to say that the United Kingdom has very little gun crime, it is also true to say that human ingenuity in finding ways to harm each other seems to know no bounds. There has been, over the last few years, increasing public concern over the level of knife crime in the country, particularly in London, where it is often said to have reached epidemic proportions.
In this edition of Free Data Friday, we will be looking at knife crime data from the House of Commons Library to map incidents of knife crime across the London boroughs to try to establish if there is any geographical pattern to this epidemic.
You can download the data as an Excel file from the House of Commons Library along with an explanatory PDF report. We will also need a shapefile for London showing borough boundaries. This can be downloaded from the London Datastore – note that the page states that while we are free to use the shapefiles to create maps, we must include two copyright notices on the output.
We will be using Proc SGMap which can’t directly use shapefiles so firstly we must use Proc Mapimport to convert the shapefile into a SAS map data set.
%let mapfile=/folders/myshortcuts/Dropbox/statistical-gis-boundaries-london/ESRI/;
%let mapfile=&mapfile.London_Borough_Excluding_MHW.shp;
proc mapimport out=london datafile="&mapfile";
run;
Having done that, we can open the downloaded data file using the XLSX engine and read the required table into a SAS data set. As is usual in a spreadsheet there are header rows and a grand total we don’t need. In addition we can ignore the value for Heathrow airport. It is part of the London borough of Hillingdon, but the number of incidents is tiny. So we can just read in the data from row 6 to row 38. We will also rename some of the variables and allocate the data to category bands (each value will be unique which creates an issue with choropleth maps which we will see later. Creating bands is one way around the problem).
libname knife xlsx "/folders/myshortcuts/Dropbox/CBPSN04304.download.xlsx";
options validvarname=any;
data borough(keep=name total category);
length name $50 category $7;
set knife."T.A4C"n;
if _n_ > 5 and _n_ < 39 then do;
name=A;
total=input(F,8.);
if total < 250 then category="0-250";
else if total < 500 then category="250-500";
else if total < 750 then category="500-750";
else if total > 750 then category="750+";
output;
end;
run;
One further point to note about the data is that it does not include figures for the City of London. This is the main financial district and although comprising only a little over one square mile, it has its own form of government and police force. It is a city in its own right and not a London borough, which explains its absence from the data.
Creating any kind of visualization is always an iterative process. I tend to start with the simplest possible design and build on that while trying not to allow the chart or map to become too complicated. Firstly, we’ll see what a bare choropleth of what the imported shapefile looks like. I do this because I find that some imported shapefiles just don't look "right" for my purposes.
proc sgmap mapdata=london;
choromap / mapid=name;
run;
This looks fine so we shall flesh it out with some data:
proc sgmap mapdata=london maprespdata=borough;
choromap category/ mapid=name;
run;
There are a few problems with this map:
We can start the process of improving the map by adding the borough names. We can do this by using the centroid macro supplied with SAS University Edition to determine the central point of each borough and then using the TEXT statement in PROC SGMap to display the borough names at those points:
proc sort data=london out=london_sort;
by name;
run;
%centroid(london_sort, centres,name);
proc sgmap mapdata=london maprespdata=borough plotdata=centres;
choromap category/ mapid=name;
text x=x y=y text=name;
run;
This wasn't a resounding success - the names were placed in the correct position but they are too long for the borough areas so there is a lot of overlap. Instead of using names, I decided to use numbers for the boroughs
data centres;
set centres;
num=_n_;
run;
proc sgmap mapdata=london maprespdata=borough plotdata=centres;
choromap category/ mapid=name name="choro";
text x=x y=y text=num;
keylegend "choro" / title="Total Incidents";
run;
This is much better, but we still have some issues to address.
proc sort data=borough out=borough_sorted;
by category;
run;
title1 "London Knife Crime Incidents by Borough - 2018/19";
title2 "Excludes data for City of London & Heathrow Airport";
footnote1 j=l 'Data from the UK House of Commons Library';
footnote2 j=l 'Contains National Statistics data © Crown copyright and database right [2015]';
footnote3 j=l 'Contains Ordnance Survey data © Crown copyright and database right [2015]';
proc sgmap mapdata=london maprespdata=borough_sorted plotdata=centres;
choromap total/ mapid=name name="choro";
text x=x y=y text=num;
gradlegend "choro" / title="Incidents";
run;
title;
footnote;
proc print data=centres noobs;
var num name;
run;
We can see from the final map that generally speaking the number of incidents rises the closer you get to the centre of London with the largest value being for borough 33 - Westminster. This is about as close to the center of London as you can get, plus boroughs south of the river Thames tend to have less knife crime than those north of the river. Having said all this, London is still an incredibly safe place to visit, work and live in so please don't let this dissuade you from including it in your holiday itinerary!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Cool
Has anyone done something like this for the US, where we have hundreds murdered each week?
Hi @tomrvincent - there's quite a lot of shooting incident data held on individual city data portals, for example New York which goes down to borough and precinct level. It should be possible to find a suitable shapefile to import into SAS and create a choropleth map for gun crime. I don't know if the data is easily available at state or national level though.
@ChrisBrooks I found a very thorough (and horrifying) database at https://www.gunviolencearchive.org/query but it only seems to allow 500 records to be downloaded at a time.
https://everytownresearch.org/gun-violence-america/ and https://lawcenter.giffords.org/ also have a variety of charts but no searchable databases from what I can tell.
Even with 100 gun deaths per day, there isn't a national standardized reporting protocol in place.
Nice example!
My one suggestion would be to add the number of Incidents to your Proc Print table.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.