An off-topic spot to chat about your musings of the day

How to find the perfect dataset (or anything else!) with Google

Reply
Regular Contributor
Posts: 229

How to find the perfect dataset (or anything else!) with Google

One of the biggest challenges I have in working with Open Data is frequently finding the dataset.  I have been playing around with Google Search Operators for some time and think I have a list of useful tips / tricks.  These don’t just apply to open data, but anything that you’re looking for.  I’m going to use a data set as my example here though.

 

So I remember seeing something on Twitter about a data set released by the City of Toronto about parking tickets; Google can search the hashtags used by various social media enabling you to track down that elusive tweet.

 

image1.png

 

image2.png

 

The next example combines two tricks; using a dash before a word or site address excludes that from your search (for example, I often use –Matlab –SPSS  as I don’t want pages with those programs referenced).  The other is putting a phrase into quotes searches for that phrase specifically; “SAS Visual Analytics” will return only those results with exactly that pharse.

 

image3.png

image4.png

 

In the above I’ve excluded “Green P Parking” (a Toronto parking authority) and so my results are more focussed to what I want.

 

Using site: is extremely useful if you’re looking for something on a specific website; although lexjansen.com already has a Google search function, using it here is a good example:

 

image5.png

image6.png

 

(This one actually surprised me, I didn’t actually think I’d get anything and I'd have to use another example!)

 

This next one is fairly obvious to most SAS users, but I often forget this one.  Using an * allows for a wildcard, either for one or multiple characters.

image8.png

 

image7.png

 

 

My results now include “Park”, “parks”, “parking”, etc. 

 

Using related: allows you to search for a similar site to the one you provide; in this case, I get other cities in Canada that have websites.

 

image9.png

image10.png

 

Another obvious one (images not provided) is OR – you can use it in a situation as simple as marathon or run, or more complex ones such as “Toronto city parking” OR “Edmonton city parking” open data – which would search for open data from both Edmonton and Toronto.

 

The final one, which I use probably most often, is filetype: and this does exactly that – will return only the results in the type of file you want.  I use this if I’m looking for documentation (PDF), presentations (PPT or PPTX), datasets (CSV, XLSX) or Google Earth overlays (KML). 

 

Here’s my final example, showing CSV results for Toronto parking:

 

image11.png

 

image12.png

 

When using the site: related: or filetype: note there are no spaces between the search and the keyword.  Putting a space will not work, and will cause you confusion and probably stress!

 

What are your favourite Google Search tips / tricks?  I’m curious to see how you find that perfect website quickly and efficiently!

 

Until next time!

Chris

Has my article or post helped? Please mark as Solution or Like the article!
Grand Advisor
Posts: 10,043

Re: How to find the perfect dataset (or anything else!) with Google

My Google-fu is either very weak , there are some "options" I can't get set or Google just doesn't like me.

 

My experience with searching for exact phrases is that I get any part matching returned, often one of 4 or 5 words, with a "helpful" little text with a line through the words excluded.

 

It does seem as though almost any non-alphanumeric often gets treated a wildcard (for me). Searches like "1/2400" will match "1 2400" which is pretty frustrating when searching for a part or model number that has / and starts matching phone numbers ...

This widget could not be displayed.
This widget could not be displayed.
Ask a Question
Discussion stats
  • 1 reply
  • 272 views
  • 5 likes
  • 2 in conversation