What’s this thing called "Internet?"
For this installment of Free Data Friday, we're going retro. Some of you may remember Bryant Gumbel's epic question in 1994 on the Today Show: “What is ‘internet,’ anyway?”
To end the week in which the world marked "Back to the Future Day," it's fitting to look back at data on internet shopping in 1997. This should be interesting and way different from what we see today. This data comes to us from a survey conducted by the Graphics and Visualizations Unit at Georgia Tech and was collected between October 10th and November 16th of that year.
How to download
If you don’t already have University Edition, get it here, follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials.
Go to this link, copy all the text, paste it into notepad and save it as a .txt file. Now you're ready to load the data into University Edition.
How to get the data and prep it for analysis
With this data, we're going to use proc import. This procedure can bring in non-standard SAS datasets and convert them. Instead of starting a filename statement we use proc import with the datafile= statement. This is essentially the same as the filename statement.
Then, in the same line, use an out= to assign a name to the output SAS dataset. The dbms= is where you insert your file type. The file type of this file is a delimited file, so here we put dlm. The replace function replaces the data every time you run this code.
If you already have a dataset with the name in the out= statement, the dataset won't be updated unless you include the replace statement. The datarow= statement specifies which row you want the data to start on. We're using row 2 since the first row is filled with the variable names.
The getnames= statement allows you to choose whether you want the program to read the first row as variable names (by default this value is yes). The delimiter= statement is where you specify what exactly is the delimiter between columns. In our case, it is a tab delimited file. Since tab is a special character, '09'x is the value assigned to represent tab in this case. Close with a run and you've just found yet another way to get a non-standard SAS dataset into SAS.
Next, you go back to the familiar sgplot. We're using the vbar statement this time to make a bar graph. Then we can use a proc freq to measure shopping with chi squared statistics as well.
proc import datafile="/folders/myfolders/my_data/web usage.txt" out=web_usage dbms=dlm replace; datarow=2; getnames=yes; delimiter='09'x; run; proc sgplot data=web_usage; vbar Shopping; run; proc freq data=web_usage; tables shopping/ chisq measures; run;
What does this output mean?
These data show us that in 1997 online shopping just hadn't taken off yet. No surprise.
Let's compare once per month vs several times per month. Once per month was the response of 544 people (7.54% of the sample). Several times per month was the response of 850 people (11.77%). A total of 1,394 people gave one of those two answers. The once-monthly shoppers can be broken into a returning-monthly group (several times per month) and a non-returning group (one shopping experience per month). Focusing on only those 1,394 people, 60.98% of the people (850/1394) were returning shoppers.
Now, look at once-per-week shoppers vs. those who shopped several times per week. Two hundred people responded with once per week (2.77% of the sample). Two hundred ninety nine people shopped several times per week (4.14%). Together, these two responses represent 499 people. These can be broken down the same way -- as a returning-weekly group and a group that shopped once a week. Comparing these two groups we find that of the 499 people, 59.91% (299/499) shopped repeatedly.
Finally, let's break the daily shoppers into a returning and non-returning group as we have for the weekly and monthly shoppers. There were 43 once-per-day shoppers (0.60% of the sample) and 60 several-times-per-day shoppers (0.83%). When comparing just these two groups we can see that 60 people of the 103 total are returning shoppers (58.25%).
This sample is a snapshot in time. It isn't longitudinal data, so we can't make a prediction. But, as people living in the survey respondents' future, we know what happened. And you don't need SAS to know that the categorical differences mentioned above only grew. More people became repeat online shoppers. Monthly purchases became weekly. It's now not unusual for many of us to shop online every day.
The prevalence of online shopping has led to a new type of crime -- thieves swiping packages left inside doors and on front porches.
Of course, not everyone is online yet. For more recent data on current Internet usage, see this Pew Study from 2015.
Now it’s your turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Need data for learning?
The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:
We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:
Click Analytics U, then select "Subscribe" from the Options menu.