What’s this data?
Today, we’re going to look at crime data from college campuses in the US, and see if there is any way to use the likelihood of certain common crimes to predict other less common crimes.
Get the data
If you don’t already have University Edition, get it here, follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials.
Get the data from here: http://ope.ed.gov/security/GetDownloadFile.aspx
When you click there, you’re likely to see this screen:
Click “Go to the login page” and you should end up on a page that looks like this:
At the bottom of the right nav, click “Download data files”:
Then click “All data combined for calendar years 2010-12”. Pick the one with datasets saved as SAS files:
How to get the data and prep it for analysis
One thing that’s nice about this dataset is that it’s already a SAS dataset. This means there is no need to do any of the regular import procedures. Just use a libname statement the same way you would use a filename statement but link to the folder that houses the file. Then just call it in like this, libname.filename. The first thing to notice about this data is the missing data and the amount of variables reported as zero. These crimes don’t happen everywhere or too often. This can be a little misleading
because some of the schools with little or no crime reported may be really small or online only. Without doing too much testing you can quickly realize that the schools with more students are more likely to have reported these crimes. To account for that you need to adjust the numbers by school population. See how we divide the crime count reported by total campus population.
libname Datasets "/folders/myfolders/my_data/"; data crime; set datasets.oncampuscrime101112; run; proc freq data=crime; tables arson10 murd10 agg_a10 burgla10 robbe10 vehic10/ missing; run; data crime2; set crime; MURD10_ADJ=MURD10 /TOTAL; AGG_A10_ADJ=AGG_A10 /TOTAL; BURGLA10_ADJ=BURGLA10 /TOTAL; ROBBE10_ADJ=ROBBE10 /TOTAL; VEHIC10_ADJ=VEHIC10 /TOTAL; ARSON10_ADJ=ARSON10 /TOTAL; run; proc corr data=crime2; var MURD10_ADJ AGG_A10_ADJ BURGLA10_ADJ ROBBE10_ADJ VEHIC10_ADJ ARSON10_ADJ; title "Adjusted"; run;
What does the output mean?
In this correlation matrix we're comparing three nonviolent crimes (robbery, vehicular theft and, burlgary) and three violent crimes (arson, aggravated assault and, murder) for the year 2010. To start, let's compare robbery to murder. The correlation is -.00049. This would indicate a nonexistent relationship between the two. When comparing aggravated assault and vehicular theft the correlation is .12218. This is statistically significant, but very small. The r-squared is about 1.5% (.12218*.12218 = the variance shared). However, two crimes that we've categorized as nonviolent correlate well. For instance, burglary and robbery have a correlation of .41519 which would indicate a moderate relationship between the two. In the case of violent crimes they are so infrequent that they don't correlate well with each other at all. There are only about a dozen murders between almost 10,000 schools. Murder will not correlate well with any variable.
This analysis showed relationships between some kinds of crime on college campuses in the year 2010, but nonviolent and violent crimes didn't seem to be related. This doesn't mean that this pattern will hold true every year on college campuses. With violent crimes like murder, these are such isolated incidents that the number will fluctuate from year to year with little consistency. If you want to test this hypothesis and others, other years are available on the website and in the zip file you downloaded.
Now it's your turn!
What interesting patterns did you find in this data? Share in the comments.
Need data for learning?
The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:
We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:
Click Analytics U, then select "Subscribe" from the Options menu.