BookmarkSubscribeRSS Feed

A SAS analysis of traffic to US Government websites

Started ‎11-04-2016 by
Modified ‎08-03-2021 by
Views 3,307

Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

 

Further to my post last week about the US Primaries, I wanted to find more American data to explore.  After some poking around, I found a couple of great datasets at https://analytics.usa.gov. One in particular was about the traffic on US Government websites, and I was intrigued. Would there be anything relevant to the upcoming elections?

 

Get the Data

I recommend going through the Analytics website – definitely enough there for me to write a year’s worth of Free Data Friday posts!  However, the data I used for this article came from https://analytics.usa.gov/data/live/all-domains-30-da

FreeDataFriday_graphic.jpg

ys.csv.  The data imported without issue.  Note: the data is for the past 30 days based on the date you’re pulling the data, so numbers will change.  I ran my data on October 30, 2016. 

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

Getting the data ready

Nothing was required to get the data ready – it was already in a format that I could use, and there were no missing or clearly incorrect data. 

 

The results

So the first thing I wanted to do is get a sense of the data, for which I did a simple scatterplot using the Task that comes with SAS University Edition:

IMAGE1.png

 

However, when I run this task, I get an error message I’ve not seen before:

 

IMAGE2.png

 

Gah!  What the heck am I supposed to do now?  Unfortunately, we can’t use the task as it is.  However, I can copy the code and make a couple of minor tweaks:

 

 

/*--Set output size--*/
ods graphics / discretemax=2000 imagemap=off; 
/* The discretemax allows me to turn off the default of 1000 distinct datapoints and customize it.  Turning the imagemap off removes the mouseovers for each datapoint */
 
/*--SGPLOT proc statement--*/
proc sgplot data=WORK.IMPORT   ;
            /*--Scatter plot settings--*/
            scatter x=domain y=users / transparency=0.0 name='Scatter';
 
            /*--X Axis--*/
            xaxis grid;
 
            /*--Y Axis--*/
            yaxis grid;
run;
 
ods graphics / reset;

 

 

This gives us the graph as below – pretty useless as we can’t see the individual sites, but it does allow us to see overall volumes and to get a sense of what’s considered a “high traffic” site.

 

IMAGE3.png

 

To make the scatter plot useful, I’m going to limit the dataset to those sites who had more than 5,000,000 (again, this from the past 30 days, so 5 million users should give me a significantly smaller dataset).

 

Here’s my PROC SQL to generate the dataset:

  

 

proc sql;
create table work.import2 as 
select * from work.import
where users>5000000;
quit;

 

When I run my scatter plot on work.import2 using the same X- and Y-variables, I get the following.  Much more reasonable as I can now read the individual sites:

 

image4.png

 

I don’t know what tools.usps.gov is. When I try and go to the site it says Server Not Found, so I assume you have to log in to get there. In any case, they have a huge number of visitors. 

 

One and done or repeat visitors?

The next comparison I wanted to do was Users and Visits, to see if most people are going in only once during the 30 days or if there are sites people tend to go to repeatedly.  Here’s how I set up my task:

 

image5.png

 

And here are the results:

 

image6.png

 

Because of limited space on the Y-axis, SAS has made a minor change to the formatting – the values are now in exponential format, where 4E7 means 4 to the 10^7 (or 40,000,000).  Again, the tools.usps.gov is clearly the top of the pile – but it also appears most users just go in once.  The forecast.weather.gov site however appears to have most visitors that view more than once, which makes sense. Knowing how often I check the weather, this doesn’t surprise me. 

 

For the final analysis, let's look at average duration of a user’s session. Are they logging on and quickly leaving, potentially an indication of getting what they need quickly or realizing it’s the wrong site? 

 

I wanted to limit my data to those people who stayed longer than 2,000 seconds; this indicate that these people have found what they’re looking for and spending an average of more than 30 minutes reading it. Or they haven’t found what they’re looking for and are determined to find it.

 

Here’s my code for the creation of the subdata:

 

 

proc sql;
create table work.import2 as 
select * from work.import
where avg_session_duration>2000;
quit;

 

Here’s the scatterplot showing the results:

 

image8.png

The usastaffing.opm.gov site has a large number of users but most spend about 30 minutes.  On the other end of the spectrum, water.noaa.gov has a smaller number of users, but they spend significantly longer on the site, almost a full hour and a half.  I would imagine that this site is limited to people working in meteorology, oceanography, etc. and possibly looking at satellite images or other types of data/documentation that require significant time to review. 

 

So although I can't say for certain that any of these sites have anything to do with the election, I'm curious to see an American perspective on this data!

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

 

Comments

I suspect that tools.usps.com is something called by the customer service bits for things like tracking packages, hold mail requests and the likes. It might be interesting to check the duration for that one explicitly to see if it is as short as I think it may be. It may well be in the less than 5 or 10 seconds.

 

 

Good point - I must admit I assumed it was something to do with voter registration (being Canadian, I don't know the process).  I was thinking I would do something like this after the election to see if there's a shift or change.  I tried to go to the site but was told that i needed to log in, so didin't try to go any further.

 

Thanks for your time 🙂

Chris

Your comment mentioned you couldn't get to tools.usps.gov, but the graph says tools.usps.com - is it possible you had a typo when you were trying it?

 

 

...nope, it was .gov that was a typo - it's the tools.usps.com I can't seem to access....

(always a compliment when a SAS Celebrity reads my stuff!)

Nice catch, and thanks for reading 🙂 

Chris

Version history
Last update:
‎08-03-2021 02:05 PM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags