BookmarkSubscribeRSS Feed

Investigating Dog Bites in New York City with SAS

Started ‎09-13-2019 by
Modified ‎08-03-2021 by
Views 5,124

SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

Heath, Our Cavalier King Charles SpanielHeath, Our Cavalier King Charles Spaniel

 

As the proud human “parents” of a Cavalier King Charles Spaniel, my wife and I naturally adore our dog. He’s very friendly with humans and sociable with other dogs and we trust him completely.

 

However, the fact is that every day people are bitten by dogs and in the wake of a spate of attacks in 1991, the UK parliament passed the Dangerous Dogs Act, which banned the keeping of any dog which is dangerously out of control along with four specific types of dog without an exemption from a court. These are the banned types:

 

  1. Pit Bull Terrier
  2. Japanese Tosa
  3. Dogo Argentino
  4. Fila Brasileiro

The ban covers not only pure breed dogs but crossbreeds. The decision as to whether a dog is a banned type must be made by an experienced police dog handler. The ban mainly affected Pit Bulls and was and still is controversial, with former and current owners claiming Pit Bulls are not inherently dangerous as is implied by the act.

 

I was interested, therefore, to discover that the New York City Data Portal contains data on dog bites in the “Big Apple,” and I decided to use SAS University Edition to determine whether or not there was cause for concern over attacks by Pit Bulls.

 

Get the Data

 

The portal allows you to download data in a number of different formats. I chose to download a CSV file.

 

FreeDataFriday_graphic.jpg

Get Started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
 

Get Started

 

Getting the Data Ready

 

used PROC Import to create a SAS data set from the CSV file. I used quite a large value for the GUESSINGROWS parameter as, although it makes the import slower, I find that otherwise text fields are very often truncated. Here is the code:

 

 

filename dogbites '/folders/myshortcuts/Dropbox/DOHMH_Dog_Bite_Data_2019.csv';

proc import datafile=dogbites
	dbms=csv
	out=dog_bites
	replace;
	getnames=yes;
	guessingrows=4000;
run;

 

This is what the imported file looks like:

 

Dog Bites DS1.png

 

The Results

 

After sorting the file I ran PROC Means to count the number of bites that occurred for each breed. I then sorted the output from that and ran PROC SGPlot to create a horizontal bar chart of the 10 breeds responsible for most bites. This is the code, followed by the chart:

 

 

proc sort data=dog_bites;
	by breed;
run;

proc means data=dog_bites noprint n;
	by breed;
	output out=bitefreq  n=numbites;	
run;

proc sort data=bitefreq;
	by descending numbites;
run;

ods graphics / reset imagemap;
title1 'Number of Dog Bites in New York City 2015-2017';
title2 'Top Ten Frequency by Breed (Excludes Missing Values)';
footnote1 j=r 'Data from https://opendata.cityofnewyork.us/';
proc sgplot data=bitefreq(obs=10); hbar breed / response=numbites categoryorder=respdesc dataskin=pressed fillattrs=(color=vpab) datalabel tip=(numbites) tiplabel=("Number of Bites"); xaxis label="Number of Dog Bites"; yaxis label="Breed" fitpolicy=none; run;

 

Dog Bites by Breed.png

 

Pit Bulls are the outlier

We can see that Pit Bulls are responsible for by far the largest number of bites per breed during this period - 1,921 against 364 for the next breed on the list, the Shih Tzu. Moreover, this is even more dramatic if you look a little farther down the list with American Pit Bull Terrier / Pit Bulls and American Pit Bull Mix / Pit Bull Mix also in the top 10.

 

I decided to group all Pit Bull and Pit Bull mixes together and match them against the number of bites for all other breeds. Firstly, I created a format to help in the final chart display and wrote a data step to classify all breed entries as either "Pit Bull Type" or "Other":

 

 

proc format;
	value  breed
		0="Other"
		1="Pit Bull Type";
run;
   
data dog_bites_std;
	format pit_bull_type breed.;
	set dog_bites;
	if find(upcase(breed),"PIT BULL") or
		find(upcase(breed),"PITBULL") then pit_bull_type=1;
	else pit_bull_type=0;	
run;

 

As before, I then sorted the file, ran PROC Means to aggregate the result and PROC SGPlot to produce a bar chart:

 

proc sort data=dog_bites_std;
	by pit_bull_type;
run;

proc means data=dog_bites_std noprint n;
	by pit_bull_type;
	output out=bitefreqtype  n=numbites;	
run;

ods graphics / reset imagemap;
title1 'Number of Dog Bites in New York City 2015-17';
title2 'Pit Bull Types v Others';
footnote1 j=r 'Data from https://opendata.cityofnewyork.us/';
proc sgplot data=bitefreqtype(obs=10);
	hbar pit_bull_type /  response=numbites
		categoryorder=respdesc
		dataskin=pressed fillattrs=(color=vpab)
		datalabel
	tip=(numbites)
	tiplabel=("Number of Bites");
	xaxis label="Number of Dog Bites";
	yaxis label="Dog Type" fitpolicy=none;
run;

 

Dog Bites by Type.png

 

From this chart we know that Pit Bulls and Pit Bull crossbreeds were responsible for 2,839 bites against 7,441 for all other breeds combined (over 25% of the total bites).

 

Unfortunately, we don't have a record of the number of dogs of each breed in New York during this period that we could use to determine a "bite ratio." There is a file of currently licensed dogs on the portal, but it doesn't give us any way to account for fluctuations in numbers over the two-year period. I was reluctant, therefore, to include it in my analysis, but the figures we have do indicate Pit Bulls being responsible for a very high percentages of attacks, giving at least some cause for further investigation.

 

Now it's your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

Comments

I am new to SAS and working on building my basic analysis skills.This article was very easy to follow and understand - thank you! How would you go about testing for possible trends in the number of dog bites per year for this example (either by breed or altogether)? 

Hi @lexb57 and welcome to the community. Firstly, many thanks for taking the time to read the article and for your kind words of appreciation, also for asking a very interesting question!

 

I confess I am a SAS developer by training and not a statistician so it is quite possible that someone else can give a better answer than I can but I can give you an idea of one way you might approach this.

 

You can see from the article that we only have two years data and so I think firstly you'd need a lot more data to be able to judge any trend. Assuming, however, that we had that then I'd take, say, the number of bites attributed to each of the top ten breeds and use Proc SGPlot to create a scatter plot of the number of bites per year for each breed in turn and add a REG Statement to create a regression line which should give you a picture of the trend in bite numbers. Of course a serious study would want to include the number of dogs of each breed in the population for that year but as I mentioned in the article that data doesn't seem to be available.

Version history
Last update:
‎08-03-2021 02:22 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags