Well, the US election is history, and the results are making headlines around the world. Was there any discernible effect on the number of downloads from government websites?
First, I looked for data on the Canadian government websites. You probably heard that Canada's Immigration website crashed on election night, presumably overwhelmed by Americans unhappy with the results. Not finding anything significant, I decided to revisit the US government website where I got data for last week's post. Today's post compares download-data sets for Nov. 9 and 10. If you’re interested, please comment below and I’ll send you the files.
Get the data
I recommend going through the Analytics website – definitely enough there for me to write a year’s worth of Free Data Friday posts! However, the data I used for this article came from the http://analytics.usa.gov site, specifically this link.
How to go about getting SAS University Edition
If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.
Getting the data ready
The data was already in a format that I could use, and there were no missing or clearly incorrect data. I did have to download the each day's data separately and merge everything, as well as add a Date column.
So the first thing I did is run a quick SQL query to see what the data looks like:
proc sql; select Date, site, Total_Events from work.import order by Total_Events desc; quit;
Huge jump in applications for US citizenship
I’m sorting the TOTAL_EVENTS descending because I want to see what the top downloads were. What I get simultaneously surprised and didn’t surprise me.
Almost 60,000 downloads for the Application for Naturalization on Nov. 10, and just over 28,000 for the Election Process PDF on the 9. I expect a lot of people were trying to figure out what was actually going on during the election.
So now I have a sense of what the data looks like, let’s dig into it a little more. As a fan of scatterplots, I wanted to take the data and similar to the above SQL query, just see where everything fell.
I use my Scatterplot task, and put in a WHERE clause for the TOTAL_EVENTS being greater than 10,000.
When I run the task, I get:
That's a lot of downloads, and I imagine the departments that process this paperwork are going to be flooded.
The final graph I wanted to show was based on Robert Allison's comment on one of my previous posts about using horizontal bar charts. I must admit, I’ve never realized why one would use horizontal versus vertical, and I’ve always found it easier comparing left to right than top to bottom. However, I now understand and the graph here shows why.
Here’s the task setting it up:
As you can see, the Childhood Arrivals label is very long; on a vertical chart, the label would be either wrapped multiple times or cut off, potentially making it unidentifiable. This way, it’s fit onto the screen and everything works.
Now it’s your turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions
The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:
We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:
Click Analytics U, then select "Subscribe" from the Options menu.