We’re smarter together. Learn from this collection of community knowledge and add your expertise.

How clean are your favorite New York and Toronto restaurants?

by Super Contributor on ‎12-04-2015 12:15 PM (943 Views)




I am thrilled that I am able to continue with the Free Data Friday series – I’m a huge fan of open data, and I love SAS University Edition for all its power and the fact it’s free! 


The data are two different datasets with the same theme. Two of my favourite cities are Toronto, Ontario (where I work) and New York City (where my wife and I go for vacation – been there over a dozen times).  One of the reasons I love these cities is for the food – pizza, pasta, burgers, I love it all.  Finding data about the restaurants in these two cities means I can dig into the restaurants I love to go and see whether I should keep going back!


Get the data

New York City Restaurant data - https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/xx67-kt59 - this pulls up an interactive website that has a download option.


Toronto Restaurant data –

http://www.toronto.ca/health/dinesafe/ - this doesn’t have a download option, but emailing the data group at https://secure.toronto.ca/dinesafe/contact.html and requesting a CSV will get you one.


How to go about getting SAS University Edition

If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.


Getting the data ready

Both data sets are set up in a way that are suitable for the types of analysis I want to show. I will be using PROC SQL, and walk through the steps.


I’ve already created a library (REST) and imported the data (NYC and TOR). 


Here’s my code for the first run through the data:



One of the reasons I like SQL is because it’s what I like to call a “plain English” language. Reading the code above, it is fairly easy to follow along – I’m asking for SAS to select 2 variables, and do a count from my REST.NYC dataset, and to only include data where the VIOLATION_DESCRIPTION is not NULL. The Group By and Order By are standard SQL syntax; it simply tells SAS that you want the data grouped by specific variables, and ordered by the same variables.


My output looks like this:



It’s not really informative, is it?  We need to have a way to sort by the Violations, so that we can see which is the largest group of infractions. There are a number of ways to do this, but my preferred method is to put the data into a temporary table and then doing another Select query based on the criteria I want.


Note the create table work.nyc_temp as statement, which tells SAS to put the data into a table in the Work library. I use _temp to keep my tables straight; you can use whatever naming convention to keep the tables in an organized manner.


The other thing to highlight is the second query; I’m now using the temp table, saying select all variables and sort the data in descending order on the Violations column.



Here’s the data, sorted by counts of violations.  Restaurants classified as “American” are clearly way ahead of any other restaurants.



OK, let’s turn our attention to the Toronto data, and see what we find there.


Here’s the query:



You’ll notice a couple of minor differences from the NYC query; specifically, the variables are different and in the second select, I added a “where infraction_details is not null”. This is a simple thing to add, but is a reminder why preliminary data exploration is important before doing any analysis.restaurants_6.png


One key thing to remember about PROC SQL is the cleanup of the temporary tables. If you wanted to run a second query on the Toronto_temp table, you’d only have to run lines 26-30 (with a PROC SQL; on line 25). If however you wanted to recreate the Toronto_Temp table (for example, you missed a variable or a criteria), then you have to drop the temp table first.




Now it’s your turn!


Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.


Need data for learning?


The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:



We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:




Click Analytics U, then select "Subscribe" from the Options menu.


Happy Learning!


Your turn
Sign In!

Want to write an article? Sign in with your profile.

Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.